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SYSTAT has a powerful DATA facility for reading and manipulating 
data. DATA is a powerful alternative to the Data Editor and tts trans- 
formation facilities. Unlike the Data Editor, which is fully menu-driven, 
DATA is a complete programming language. DATA has very powerful 
relational database facilities and SYSTAT BASIC, a full-featured im- 
plementation of BASIC. Do not explore this volume, however, until you 
are comfortable with the Data Editor (see the Getting Started volume) 
and have tried some statistical analyses. 


Here are some things you can do in either the Editor or with DATA. 


Data Editor DATA 

Enter data 

from keyboard type in cell INPUT... 

from text file Paste from Clipboard GET... 

from Excel file Import... — 

Add cases Paste from Clipboard or APPEND... 
type in columns 

Drop cases Delete... IF... THEN DELETE 

Add variables Paste from Clipboard or USE file? file2 
type in columns 

Drop variables Delefe... DROP... 


In general, you should use the Editor for simple or smaller tasks 
(transformations on a few variables) and consider DATA for large tasks 
(transformations on many variables). 





Overview 


eee 


This volume discusses the following tasks. 





Entering data (Chapter 2) 
You Can enter data from the keyboard or read it from plain text 
(ASCH files). In either case, the data may be either free-format, where 
each value is separated by a comma or a space, or both, or fixed- 
format, where each value appears in the same place in every row 
(for instance, the first value always starts with the first character of 
each line, the second value starts on the fifth character, etc.). 


Printing data files (Chapter 3) 
You Can view your data file in its entirety or view and print just cer- 
tain cases of certain variables. 


Saving data in text files (Chapter 4) 
You can save data in plain text files suitable for exporting to other 
applications or platforms. In so doing, you can also change variables 
from numeric to character, rearrange the file, and reduce the file 
size. 


Rearranging and combining files (Chapter 5) 
Just as you can stick files together in the Data Editor by using Cut 
and Paste, you can use DATA to combine two files side-by-side or 
one on top of the other. You can rearrange the variables, drop vari- 
ables, delete cases, and transpose files. You can do all of these things 
(except rearrange variables) with the regular Data Editor, but for 
large files, DATA is more efficient. 


Transforming variables (Chapters 6) 
DATA allows you to create new variables using a wide range of 
mathematical functions. Statements like IF... THEN let you do condi- 
tional transformations quickly. You can recode variables with a sim- 
ple CODE statement and add value labels with a LABEL statement. 
Some of these features are available in the Data Editor, but DATA 
handles a large number of variables more efficiently. 
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Programming in SYSTAT (Chapter 7) 
DATA contains SYSTAT’s version of the BASIC (Beginner’s All- 
purpose Symbolic instruction Code) programming language. BASIC 
statements like IF... THEN and FOR...NEXT let you do complex 
transformations quickly. 


Sorting, ranking, and standardizing (Chapter 8) 
You can sort, rank, and standardize files with the Data Editor, but 
with DATA you can do more complex variations like computing 
Winsorized and trimmed means, normal scores, medians, etc. 


Subgroup processing (Chapter 9) 
DATA has four built-in grouping variables: beginning of file, end of 
file, beginning of group, and end of group. You can use these vari- 
ables to process your-data in subgroups, for example to compute 
sums, means, medians, etc. within groups. 


Programming examples (Chapter 10) 
In Chapter 10 we present many examples of more advanced DATA 
applications using principles learned in earlier chapters. Among _ 
other things, we show how to do operations across rows in a dataset 
(for instance, computing means across variables within cases) and 
how to generate various kinds of random data. 


Command reference (Appendix 1!) 
At the beginning of each chapter (2-10), we present a “Command 
reference” with concise information about the syntax and effect of 
each DATA command that is introduced in that chapter. Appendix | 
collects these references into a single, alphabetized digest of com- 
mands. The reference begins with an even terser list of commands 
with one line descriptions; you might turn to the first page now to get 
a feel for the types of commands you'll learn about. 


SYSTAT file structure (Appendix 11) 
We present technical information about the way SYSTAT stores and 
uses data files. This information will be of use only to the most ad- 
vanced users. 








General usage 


eee 


Using DATA 


DATA is entirely command driven. The rest of SYSTAT and 
SYGRAPH is menu-driven with 2 mouse interface, but it, too, has an 
optional command interface that you can use at any ume. If you need to 
keep a record of your work, all you have to do is open the Command 
window. While you use the menus, SYSTAT generates the command 
equivalents of your actions. You can save the commands in a file, edit 
them, and resubmit them in two weeks to grind through another analy- 
sis. 


Start SYSTAT. The next thing you need to do is open the Command 
window. 


@ Select Show command window from the Goodies menu 






Goodies 
Information #t 


Show view window 


Show command windou 














Rede bast analysis ah 
Mavie 2M 
Plot tools and colors 3#P 
Show options in effect 
Record preferences 
Define hundies... 


Notice the greater-than sign (>) in the corner. This is a prompt. It tells 
you that SYSTAT is ready for you to give a command. 







Now, type DATA. This tells SYSTAT that you want to use the DATA 
facilities. Press Return at the end of any DATA command, including the 
DATA command. 


© 1989, SYSTAT, Inc. 





General usage 


Notation 
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aaa SYsTAT Command 


>DATA 
WORKSPACE CLEAR FOR CREATING NEW DATASET 
> 





Now you're ready to go. Type commands after the prompt (>), one 
command per prompt, and when you are ready to execute the Com- 
mands, enter RUN after the prompt. The following chapters will intro- 
duce you to the commands. Remember, the commands will not work 


until you type RUN. 


In the Command reference appendices for SYSTAT and SYGRAPH, 
we explain command syntax and notation in great detail. If at any time 
you are unsure of the notation we use in the command reference sec- 
tions for DATA, just check there. 


In the meantime, you only need to know about one convention we use 
in displaying commands: placeholders. Placebolders are words or symbols 
that we show in place of the actual things you would type when using 
commands. 


For example, the first thing you usually do is open a file. The USE 
command tells DATA to open the file you name after the word USE. 


USE MEDICAL 


Here, we ask DATA to open the file named “MEDICAL.” You could 
name any file after USE. 


USE BASEBALL 
USE US 
USE USDATA 


If you have a file in a different folder, you would give the entire path- 
name for that file, enclosed in single or double quotation marks: 


USE "HARDDISK:SYSTAT:Data Files:BASEBALL' 
USE ‘FLOPPY :MYDATA’ 
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HOT and COLD 


You also need to use quotation marks if your filename (or any folder 
contained in the pathname) contains spaces or symbols. If your file (or 
any folder named in the path) contains single or double quotation 
marks, surround the whole thing with the opposite type of marks: 


USE "Tom's data" 


As you can see, you could type many different things after USE. Instead 
of trying to list all the possibilities, we just use the placeholder filename: 


USE filename 


Here, filename means “any valid filename or pathname, including quota- 
tion marks if needed.” 


You can distinguish placeholders from actual command words because 
we always print placeholders in italic, lower-case letters. Other placehold- 
ers you'll see frequently are varlist, which means one or more variables, 
and 7, which means some number. Usually we give rules, like “n can be 
any positive integer between 1 and 100.” 


We use square brackets [ and ] to indicate things that are optional. For 
instance, you can optionally specify variables for the LIST command: 


LIST [varvVist] 


The brackets mean that you can list specific variables if you want, or you 
can press Return right after the word LIST. Both of the following 
would be valid commands: 


LEST 
LIST ACCIDENT, CARDIO, CANCER 


Other abbreviations and notation conventions should become obvious as 
you study the examples. 


Finally, you will occasionally see the word HOT in this volume. HOT 
commands produce output immediately after they are entered. COLD 
commands, on the other hand, set options or switches. All HOT com- 
mands are labeled with the word HOT in the reference lists. The most 
common HOT command is the RUN command. 
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Command reference 


Basics 
Types of data 
Numeric values 
Character values 
Free- and fixed-format input 


Free-format input 
2.1 Keyboard input 
2.2 Reading from a text file 
2.3 Unequal length records 
2.4 Records with extraneous data values 
2.5 Incorrect treatment of missing values 
2.6 Correct treatment of missing values 
2.7 Multiple cases per record: backslash 
2.8 Incomplete records: backslash 
2.9 Incorrect use of backslash 


Fixed-format input 
Formats 
2.10 Simple example 


Entering triangular matrices 
TYPE 
2.11 Entering a covariance matrix 
2.12 Entering a matrix with missing diagonal 


ASCII files ; 
Importing files from other applications 
Trouble-shooting 
Errors and error messages 
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Entering data 2 


ne 


Overview This chapter shows you how to create SYSTAT files by typing data 
from the keyboard and by reading plain text (ASCIJ) files. 


Note that you can enter data from the keyboard in the regular Data 
Editor, and Import... from the File menu lets you read plain text files 
and Microsoft Excel files into Editor. See the Getting Started manual. 


For entering data from the keyboard, the only reason you might want to 
use DATA would be if you prefer typing spaces to pressing Return be- 
i tween values. For reading ASCII files, unless you require the special 
control of fixed-format input (see below), you do not need to use 
DATA. 


wrk 
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Command reference 


DIAGONAL=PRESENT | Specifies whether the matrix yOu are en- 
ABSENT tering has values in the diagonal cells. 
The diagonal is assumed to be present 
unless you state otherwise with 
DIAGONAL=ABSENT. 


eee 


GET filename Reads the ASC (plain text) file filename. 
IMPORT filename Translates filename into SYSTAT format. 
[(varliso] You can optionally include varlist if you 
want to import only certain variables 
from the file. 


/ TEXT | MAP | EXCEL Specifies importing format. TEXT, the de- 
fault, reads plain text (ASCII) files. MAP 
reads map data into .MAP files (see 
Appendix V in the SYGRAPH volume for 
information about map files). EXCEL 
reads Microsoft Excel files, 


eee 


INPUT varlist Names the variables (and indicates order) 
that will be read into SYSTAT. You may 
identify a range of variables in varlist us- 
ing subscript notation. 


\ | For free-format input, place a backslash 
after varlist to force SYSTAT to start a new 
case for each line of data and to use ev- 
ery value entered in each row, even if it 
must start filling new cases to do so. See 
Example 2.8. 


INPUT (varlist) (formad For fixed-format input, INPUT has two 
arguments, each enclosed in parentheses. 
As above, varlist indicates the variable 
names, in order. Format is a format de- 
scription in special notation, discussed in 
this chapter. | 
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RUN Executes commands. HOT. 

SAVE filename Saves data into the SYSTAT file filename. 
/'comment! Saves your comments in the file. 
DOUBLE | SINGLE Specifies whether to save in double or 


single precision. 


mL 


TYPE = Specifies the type of matrix you are enter- 
RECTANGULAR | ing. Use DIAGONAL=ABSENT if the di- 
SSCP | COVARIANCE! agonal values are missing. 
CORRELATION | 
DISSIMILARITY | 
SIMILARITY 


1 TEM me - ree ® 
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Basics 
ee 


DATA provides two basic ways to enter your data into SYSTAT: 


1) Read data from a text (ASCTD) file stored in a file. 
2) Type values from the keyboard directly, 


There are three commands that you will always use when creating a 
SYSTAT file: SAVE, INPUT and RUN. SAVE names the SYSTAT 
file you are creating. INPUT names the variables you are reading. RUN 
sets the procedure in motion. In addition, the GET command identifies 
the location of a text data file you wish to enter into SYSTAT. 


Keyboard input 
SAVE filename 
INPUT varvist 
RUN 


Input data one case at a time after prompt arrow 
> 


File input 
SAVE filename 
INPUT var7ist 
GET filename 
RUN 


Types of data SYSTAT accepts numbers and characters as data. Numeric variables 
contain numbers, and character variables (denoted with a $ at the end of 
their name) contain character strings. 


Numeric values = Numeric values should have no more than 12 digits before or after the 
decimal point, and a total of no more than 15 digits. You can also sepa- 


rate numeric values with a slash. This allows you to read dates (e.g. 
11/5/44) into several variables (e.g. MONTH,DAY, YEAR). 
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Entering data 


13579 


.326 

123456789. 

1.9E4 

1.9e-4 

.8D15 

123456789 .123456/89 


Scientific notation 


Two of the numbers contain the letter E. These numbers are in scien- 
tific notation. 


1.94 = 1.9 * 10° = 19000 
1.9E-4 = 1.9 * 107 = .00019 


The number containing D works the same way. This is a “double preci- 
sion” exponent printed by some computer languages. You can use any 
integer exponent with absolute magnitude less than or equal to 35. 


The last example contains 18 decimal digits. Because SYSTAT inputs 
numbers only up to 15 decimal digits, this number rounds off to 
123456789.123457. 


Numeric missing values 


The period (.) on the second line of our example denotes a missing 
value. Missing numeric values are set to a number smaller than the 
smallest value used in any calculations. All the statistical modules rec- 
ognize this value and exclude it from computations. If you want to de- 
note a missing value when typing from the keyboard or entering data 
from a file, make sure you use the period. Otherwise, SYSTAT will look 
for the next value you type and not realize a value was missing. 


Numeric variable names 


Numeric variable names are | to 8 letters and/or numbers beginning 
with a letter. They may be subscripted. 
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Character values Character variables may contain strings up to 12 characters long. They 
should be separated by blanks and/or commas and may be surrounded 
by quotation marks (' or "). The values must be surrounded by quotation 
marks if they contain embedded blanks and/or special characters. If 
more than 12 characters exist in the input value, the value is truncated to 
the first 12. If fewer than 12 characters exist, blanks are inserted at the 
end. Here are some examples: 7 


MALE 
"New York’ 
ANTIDISESTABLISHMENTARIANISM 


The last string in this example is rruncated to ANT IDISESTAB when it 
is stored. If there are more variables to be read from the line, SYSTAT 
begins with LISHMENTARIANISM. This causes an error when 
DATA is expecting a numeric value. If the next variable to be read is a 
character variable, you get no error message—you just get a messed up 
data file. Be careful. 


Character missing values 
Missing values for character variables are denoted with a blank sur~ 
rounded by quotation marks: (" ") or ('). 


Character variable names 
Remember to use dollar signs with character variable names. If you try 
to read character data into a numeric variable, SYSTAT prints an error 
message and lists the data it was unable to process. 


Free- and fixed- There are two types of input. Free-format input works with delimited 

format input data (data where each value is separated by spaces or commas), and fixed- 
format input works with data where the values of variables are in the 
same locations in each record. Each is discussed in more detail below. 
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Free-format input 
Se a a ee 
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After seeing a RUN command, SYSTAT looks for data values. If you 
typed GET before RUN, SYSTAT looks for the file you named and 
reads the data values from it. Otherwise, SYSTAT expects you to type 
the values on the keyboard. 


Whether typed from the keyboard or read from a file, data values should 
be separated by tabs, commas, or spaces. Each new case should begin on 
a separate line (press the Return key to start a new line). You may read 
several lines of values into a single case as long as the next case begins on 
a new line. 


Character values that contain blanks, commas, or special characters 
must be surrounded by single or double quotation marks (' or "). Miss- 
ing values must be represented by a period (.) for numeric variables or a 
blank surrounded by quotation marks ("" or ' ') for character variables. 
A tab followed by another tab or comma followed by another comma 
will zot be read as a missing value. 


SYSTAT continues reading the data until it encounters the end of a file, 
a tilde (~) sign or, if it is expecting numeric data, a non-numeric string. 
In general, this means that you can end a batch of (numeric) data by typ- 
ing another command. When in doubt, use a tilde to end the input ex- 
plicitly. 


To read data from a text file, add a GET filename command before the 
run command. GET tells SYSTAT that the data you want to read is lo- 
cated in an ASCII file. The file must be plain ASCII text, containing no 
page breaks, margin indicators or control characters. It must contain 
only raw data, with no column headings or variable labels. 
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2.1 
Keyboard input 


2.2 
Reading from a 
text file 
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Free-format inpu 
Create a small SYSTAT file by entering the following commands in 
DATA: 


SAVE MYFILE 
INPUT ABC 


* RUN 


Input data one case at a time after prompt arrow 
> 


Now enter data one case to a line: 


UN Se PB 
oO wm ns 
ON Ww 


The last character is a tilde that tells SYSTAT to end data input. 
SYSTAT responds: 


3 cases and 3 vartables processed. 
SYSTAT file created 


You have just entered data from the keyboard and saved it to a SYSTAT 
data file. 


The procedure is similar for reading data from a text file. Suppose the 
data you typed in above is in a text file called INFILE. To import it, you 
would enter these commands: 


GET INFILE 
SAVE MYFILE 
INPUT A BC 
RUN 


3 cases and 3 variables processed. 
SYSTAT file created. 


The only difference between this and the first example is that you use 
the GET command to tell SYSTAT that the data is coming from a text 
file. 
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22 
Unequal length 
records 
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The ASCII file must not contain nonprinting ASCII characters. There 
must be no page breaks, control characters, column markers, margin 
indicators, etc. SYSTAT can read numbers, alphabetic and keyboard 
characters, delimiters (spaces, commas, or slashes that separate consecu- 
tive values from each other), and carriage returns. 


For each new case, SYSTAT reads as many data values as are named in 
the INPUT command, one value per variable. This example shows what 
happens when each input record has a different number of data values. 
Enter the following commands in DATA: 


NEW 

SAVE MYFILE 
INPUT A BC D 
RUN 


Input data one case at a time after prompt arrow 
> 


Now enter the following data, separating the values with spaces: 


10 20 30 40 

50 60 

70 80 

90 100 110 120 


3 cases and 4 variables processed 
SYSTAT file created. ‘ 


To view the contents of this new SYSTAT file, type: 


LIST 
RUN 
B C D 
Case 1 10.000 20.000 30.000 40.000 
Case 2 50.000 60.000 70.000 80.000 
Case 3 90.000 100.000 110.000 120.000 
3 cases and 4 variables processed 


No SYSTAT file created. 
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Records with 
extraneous data 
values 
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Free-format input 


Line by line, here is what SYSTAT has done: 


Case 1. SYSTAT reads the four data points from the first record of 
original data into the first case of the SYSTAT file. 


Case 2. SYSTAT now reads the two data points from the second record 
(50 and 60) into variables A and B, respectively, in case 2 of the 
SYSTAT file. 


SYSTAT must still fill variables C and D for case 2. It therefore reads 
the data points 70 and 80 from the third record of original data into C 
and D in the second case of the SYSTAT file. 


Case 3. SYSTAT now begins a new case in the SYSTAT file and so 
proceeds to the next record of original data. It reads the four data points 
90, 100, 110, and 120 into case 3 of the SYSTAT file. 


This example would work the same way if you entered these data from a 
file. | 


For each new case, SYSTAT begins on a new line to read as many data 
values as are named in the INPUT command. This example demon- 
strates what happens for a case when there are more values left on a 
record than needed to fill variables named in the INPUT command. 


NEW 

SAVE NEWFILE 
INPUT ABC D 
RUN 


At the subsequent prompt, enter the following records of data: 


10 20 30 
40 50 60 70 80 
90 100 110 120 130 140 


2 cases and 4 variables processed 
SYSTAT file created. 


Note that SYSTAT processed the three records into two cases. List the 
file: 
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-_Free-format input | Entering data 


LIST 
RUN 
A B C 0 
t 
: Case 1 10.000 20.000 30.000 40.000 
Case 2 90.000 100.000 110.000 120.000 
2 cases and 4 variables processed 


No SYSTAT file created. 


What has happened to the original data? SYSTAT did not read the val- 
ues 50, 60, 70, 80, or 130 and 140 into the SYSTAT file. 


: ry TT 


Case 1. SYSTAT reads the values 10, 20, and 30 from the first record 
of original data into'variables A, B, and C, respectively. SYSTAT sull 
needs a value for D in case 1, so it takes the value 40 from the second 
record of data and puts it there. 


Case 2. SYSTAT now starts a new case (case 2) in the SYSTAT file. It 
assumes that a new case in the SYSTAT file corresponds with a new 
record of original data, so it jumps down to the third line of data, 
thereby skipping the numbers 50, 60, 70, and 80. 


The value 90 becomes the first number for case 2 of the SYSTAT file. 
SYSTAT reads the next three data points from record three of original 
data into variables B, C, and D. Since SYSTAT has completed the case, 


ace it never reads the numbers 130 and 140. 

2.5 SYSTAT represents missiny numeric data as a period (.). (Internally, 
an Incorrect SYSTAT codes missing values as the smallest possible negative value.) It 
Be treatment of treats missing character data as blanks. 


missing values 
Code missing numeric data as periods in your original data. You cannot 
code missing numeric data as a character value such as NA, M, *, or? 
: because SYSTAT will not read character data into a numeric field. Also, 
ae if missing numbers are left as blank spaces in your original data, 
SYSTAT reads the blank space as a delimiter. It places the next value in 
the file where the missing value should be, and all subsequent data are 
displaced (see example below). 





Code missing character data as a blank space enclosed in single or double 
quotation marks, e.g. " ". Do not merely leave values blank. 
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Entering data 


2.6 
Correct treatment 
of missing values 
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Free-format inou 


The following example demonstrates what happens when you do not 
code missing values as periods. 


SAVE NEWFILE 
INPUT A, B, C 
RUN 


Enter these data: 


100 200 300 
400 600 
700 800 900 


SYSTAT produces the following incorrect file: 


A B C 


Case 1 100.000 200.000 300.000 
Case 2 400.000 600.000 700.000 


Instead of three cases, SYSTAT produces two, skipping over the values 
800 and 900. This is similar to Example 2.4. SYSTAT reads the first 
line of data correctly, but treats the missing value in the second line as a 
space delimiter separating the values 400 and 600. SYSTAT therefore 
places 600 under variable B in the case 2 of the SYSTAT file. It com- 
pletes case 2 by reading 700, the first value of the next line of raw data. 
Now, since SYSTAT starts a new case in the SYSTAT file, it jumps to 
the next line of original data. There are no more lines of data to read in, 
however, so SYSTAT closes the file. 


You could read these data with fixed-format input (see below). The next 
example, however, shows how to do it successfully with free-format in- 
put. 


If we correctly code the missing data point as a period in our original 
data: 


100 200 300 


400 . 600 
700 800 900 


SYSTAT produces the following correct SYSTAT file: 
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Free-format input | Entering data 


2./ 


Multiple cases 
per record: 


backslash 
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A B C 
Case 1 100.000 200.000 300.000 
Case e 400.000 : 600.000 
Case 3 700.000 800.000 900.000 


SYSTAT reads the data into the appropriate columns and cases and 
codes the missing value as a period. If the missing data value had been in 
a character variable, then we would have used a quoted blank (" ") in- 
stead. 


If you want to read more than one case per line, append the backslash 
(\) to your INPUT statement. The backslash forces SYSTAT to use all 
the data in every row, even if it has to start filling a new case to finish 
using the row of values. Also, the backslash forces SYSTAT to start a 
new case whenever it starts reading a new row of values. 


Recall that, without a backslash, SYSTAT skips over any extra values in 
a row, and it fills every case, even if it has to read several lines of data to 
do so. 


In other words, the backslash forces SYSTAT to use all of the values 
you enter and to pay attention to your line breaks. 


This example shows how to use the backslash to read data where you 
have more than one case per line of origina! data. 


NEW 

SAVE MULTIPLE 
INPUT NAME$, AGE \ 
RUN 


TOM 23 JERRY 51 MARILYN 50 LYNNE 18 
MARK 22 ANDREW 8 HENRY 70 CHRIS 23 


~ 


8 cases and 2 variables processed 
SYSTAT file created. 


Display the file you have just created. 


s 
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Entering data 


We wee ee. - 


A fe i le, 


2.8 
Incomplete 
records: 
backslash 


24 





Free-format inp 


LIST 
RUN 
NAME$ AGE 
Case 1 TOM 23.000 
‘Case 2 JERRY 51.000 
Case 3 MARILYN 50.000 
Case 4 LYNNE 18.000 
Case 5 MARK 22.000 
Case 6 ANOREW 8.000 
Case ‘, HENRY 70.000 
Case 8 CHRIS 23.000 
8 cases and 2 variables processed 


No SYSTAT file created. 


This input works because we used the backslash. SYSTAT reads the en 
tire line of original data even though each line fills up four cases in the 
SYSTAT file. Without the backslash, SYSTAT would have read only 
the first two values from each line of data, producing the following file: 


NAME$ AGE 
Case 1 TOM 23.000 
Case 2 MARK 22.000 


The following example shows how to use the backslash to read records 
that do not have an equal number of values per case. 


NEW 
SAVE UNEQUAL 
INPUT ABC D\ 
RUN 


oO PB Ee 
Oo PA 
HH vw 
~~ 


3 cases and 4 variables processed 
SYSTAT file created. 


The SYSTAT data file looks like this: 
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t 
? 
} 
f 
; 
$ 





i 
os! 
. dura 
Sa 
wat 
Fe ea 
_ 
a aa 
* 
ea 


ee Le lk fap th ae 


Free-format input Entering data 


A B C D 
ory 
Case i 1.000 2.000 3.000 } 
Case 2 4.000 5.000 6.000 7,000 
Case 3 8.000 9.000 ; : 
2.9 The following example shows how to use the backslash to read records 
Incorrect use of that do not have an equal number of values per line. 
backslash 
NEW 
SAVE KLUDGE 
INPUT ABC \ 
RUN 
12345 
6 7 
8 


4 cases and 3 variables processed 
SYSTAT file created. 


The SYSTAT data file looks like this: 


A B C 
Case 1 1.000 2.000 3.000 
Case 2 4.000 5.000 : 
Case 3 6.000 7.000 
Case 4 8.000 : 


Here is how SYSTAT read the data: 


Case 1. SYSTAT reads the first three values from the first line of origi- 
nal data (1, 2, and 3) into the first case of the SYSTAT file. 


Case 2. SYSTAT begins a new case in the SYSTAT file. With the 
backslash appended to our INPUT command, SYSTAT does not jump 
to a new line of original data. Rather, it stays on the same input record 
and reads the remaining values (4 and 5) into variables A and B of case 2. 
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Entering data 
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Free-format input 


These values fill only the first two columns in case 2 of the SYSTAT 
file. Without the backslash, SYSTAT would complete the case with the 
first value from the next line of original data. The backslash, however, 
causes SYST'AT to fill the remaining values for the case with missing 
values. 


Case 3. SYSTAT begins case 3 of the SYSTAT file, reading data from 
the next line of original data (line 2). It reads two values from this line (6 
and 7) into the first two variables, and assigns a missing value to the 
third variable. 


Case 4. SYSTAT begins case 4 of the SYSTAT file and reads the data 
from the next line of original data (line 3). It reads the first and only 
value from this line (8) and fills the remaining two cells with missing 
values. 


If we had not used the backslash in the INPUT statement, SYSTAT 
would have produced the following file: 


A B C 
Case 1 1.000 2.000 3.000 
Case 2 6.000 7.000 8.000 


SYSTAT fills the first case with the values 1, 2, and 3. The case com- 
plete, it starts a second and begins reading from the second input line. 
Thus, the values 4 and 5 are lost. SYSTAT puts the values 6 and 7 in 
case 2 under variables A and B, and reads the third value for this case 
from the third line of data. 
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Fixed-format input 
aN a 


| With fixed-format input, you tell SYSTAT exactly where the values for 
each variable are located in the data records. Values for a variable must 
be in the same place for every record. 


There are two parts to a fixed-format input statement. In the first part, 
name the variables as you want them to appear in the SYSTAT file. The 
second part of the statement contains the format, which determines 
where SYSTAT reads values for each variable. 


‘As usual, variable names must be 8 characters or less. Be sure to use dol- 
lar signs to indicate character variables. The data type of a format item 
(character or numeric) must match the data type of its respective vari- 
able name. Finally, the number of items specified in the format must 
match the number of input variables. 


3 Enclose the variable names and the input format in separate sets of 
be parentheses, like this: 
cr INPUT (AGE,SEX$,INCOME) (4#3,$6,#8) 


The format controls a pointer that tells SYSTAT where to read the next 
variable value. SYSTAT checks the number of items you specify in the 
format against the number of variables. If they do not match, it is an 
error. : 


Formats Formats specify the location and width of fields that contain values. 
Leading and trailing blanks are ignored for numeric data. All characters 
within the formatted field, except leading blanks, are read into a charac- 
ter value. In other words, character strings are left justified. 


© 1989, SYSTAT, Inc. 27 








Tea 
Entering data Fixed-format input 


Format items for fixed-format input include the following: 


#nm reads a numeric variable in the next n columns 
$n reads a character variable in the next n columns 
> moves the pointer one column to the right 

< moves the pointer one column to the left 

“n moves the pointer to column n 


/ moves the pointer to the first column on the next record (line of 
data) 

“en moves the pointer to the first column on the mth record 

\ leaves the pointer on the current record for next case 


nr repeats rn times, where ris any of the above 
Some examples: 
>>> moves the pointer 3 columns to the right 


3*> moves the pointer 3 columns to the right 
“10 moves to column 10 of the current record 


#4 reads the numeric value in the next 4 columns beginning at 
the column where the pointer is now 

$5 reads the character value in the next 5 columns beginning at 
the column where the pointer is now 

A3 moves the pointer to column 3 


>>>>> moves the pointer 5 columns beyond its current position. 

5*> does the same thing 

%2 moves the pointer to column 1 of the second record (You may 
not skip back to an earlier record) 


/ moves the pointer to column 1 of the next record 
// moves the pointer to column 1 two records ahead. (Thus, if 
you are starting on the first record, %3 and // mean the same 
' thing) | 
#3 reads the numeric value in the 3 columns beginning at the cur- 


rent pointer position 
2*$3 reads a character value in three columns and then another in 
the next three columns 


Note that \ does not skip to the next record before reading a new case. 
This feature is useful for reading files with different numbers of records 
(lines) per case. 
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Fixed-format input Entering data 


2.10 
Simple example 


It is generally safer to use % and “ rather than / and >, since the former 
ensure that you know precisely which record and column you are on. 
Furthermore, if you have 7 records (lines of data) per observation, and 
you don’t read the seventh record, you need a %7 at the end of your 
format to insure that the pointer is posiuoned correctly for the next 
observation. 


Here is a simple example for reading some data with a format. 


INPUT (A,B$,.C) (#3 $5 > #3) 
RUN 

120abcde 00/7 

l2lfghij 999 


The tilde (~) indicates that you will not enter any more data. 


Note: if your INPUT statement takes up more than one line, do not let 
the statement wrap around to the next line. Rather, end the first line 
with a comma, press Enter, and keep typing the statement on the next 
line. Do this for as many lines as you need. 


Suppose you have an ASCII file TESTDATA like the following. The 
two italicized lines are to help you count columns. 


0 1 2 


12345678901224567890123456 
1232 BILLY 0 1 1 1 0 BAgDD 
CEAD 


. 8 
7384 SUSAN 1.1 0 1 1 BDAEA 
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DDEAE 
2837. TIM 11101 CBADE 
DDBCA 
7484 0M 00101 BCDEC 
AAEDC 
5678 WAYNE 1 1 0 1 0 ADEAA 
DACBB 


The first variable in the file is a four-column ID number. The second is 
the first name of a student. The next five variables are answers to true- 
false questions and are separated by spaces. The last five variables on the 
first line are answers to multiple-choice questions and are not delimited. 
The variables on the second line are answers to five more muluple 
choice questions. 
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Entering data Fixed-format input 


This program reads the data into a SYSTAT file called TEST.SYS. 
Because the INPUT statement takes up more than one line, we use 
commas to continue it onto subsequent lines. 


GET TESTDATA 
SAVE TEST 
INPUT (ID, NAME$,Q(1-5),06$,07$,08$,09$.010$, 
Q11$,012$,013$,014$,015$), 
(#4,$6,5*#2,>,5*$1,%2,5%$1) 
RUN 


Here is how each variable is read by its format description: 


#4 Reads numeric value from first four columns into variable ID. 
$6 Reads character value from next six columns into NAMES. 
5*#2 Reads five consecutive two-column numeric values into Q(1), 
Q(2), Q(3), Q(4), and Q(5). 
> Moves pointer one space to the right. 
5*$1 Reads five consecutive one-column character values 
into Q6$, Q7$, Q8$, Q9$, and Q10$. 
%2 Moves pointer to second line of input record. 
5*$1_ Reads five consecutive one-column character values 


into Q11$, Q12$, Q13$, Q14$, and Q15$. 
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TYPE 


2.11 
Entering a 
covariance 
matrix 


ie? iy. 

Entering a matrix 
with missing 
diagonal 
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Entering triangular matrices 





Data in SYSTAT files assume cases-by-rows (rectangular) form by de- 
fault. Both DATA and the regular Data Editor allow you to enter trian- 
gular matrices, such as might be produced by Corr. (See the Gerting 
Started manual for instruction on how to do this with the Data Editor.) 


Use the TYPE command to indicate what type of matrix you are enter- 
ing. The default is RECTANGULAR. SSCP designates a sum of 
squares and cross products matrix. COVARIANCE designates a covari- 
ance matrix. CORRELATION designates a correlation matrix. 
DISSIMILARITY and SIMILARITY indicate dissimilarity and similar- 
ity data, respectively. Some procedures like Corr and MDS output a tri- 
angular matrix to a SYSTAT file and automatically set the type. 


If you LIST a triangular matrix in DATA, the upper triangular portion 
is missing values, since the matrices are symmetric and only half of the 
values are needed by the statistical routines. 


Here is an example of how to enter a covariance matrix and save it in a 


file named TURTLE. 


SAVE TURTLE 

INPUT LENGTH, WIDTH, HEIGHT 
TYPE COVARIANCE 

RUN 

451.39 

271.17 171.73 

168.70 103.29 66.65 


~ 


For some types of data, the values on the diagonal are undefined or con- 
stant. You may, in these cases, input only the values below the diagonal 
and leave the diagonal missing. Use DIAGONAL ABSENT to signal to 
DATA that the diagonal values are missing. 


“Ifyou do not use the command, DIAGONAL PRESENT is assumed. If 


you input a correlation matrix with DIAGONAL ABSENT, SYSTAT 
sets the diagonal elements to 1.0. Otherwise, the diagonal elements are 


set to missing values. 
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Entering triangular matrices 


Here is an example of how to input a similarity matrix with the diagonal 
elements omitted. 


SAVE COLORS 

INPUT RED, ORANGE, YELLOW, GREEN, BLUE, INDIGO, VIOLET 
TYPE SIMILARITY 

DIAGONAL ABSENT 


RUN 

10 

9 9 

7 10 10 

1 4 9 10 

6 5 7 9 9g 
9 8 5 8 910 


Notice that there are only 6 rows and columns to fill 7 variables. The 
diagonal elements are set to missing. The example above saves data ina 
SYSTAT file named COLORS for possible use by the multidimensional 
scaling procedure MDS. : 
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ASCII files 


a 


Importing files Import... in the File menu is a facility of the Data Editor that allows 

from other you to translate Microsoft Excel files directly into SYSTAT files. You’ 

applications do not need to transfer Excel files into text files to be able to read them 
with DATA. See the Getting Started volume. 


You can also use the IMPORT command to prepare .MAP files for 
SYGRAPH; see Appendix V of the SYGRAPH manual for more infor- 


mation. 
Trouble- The previous examples apply also to ASCII file input; the same sorts of 
shooting mistakes that one can make when entering data from the keyboard will 


also cause input from ASCII files to go wrong. 


You also must be sure that your ASCII file does not contain any 
“funny,” i.e. nonprinting, ASCII characters. The file can contain no 
page breaks, control characters, column.markers, etc. SYSTAT can read 
numbers, alphabetic and keyboard characters, delimiters (spaces, com- 
mas, or slashes that separate consecutive values from each other), and 
carriage returns. SYSTAT does its best to interpret other characters but 
makes no guarantees. 


Also, numeric fields must contain only numeric data. Therefore, exclude 
variable labels or column headings from the ASCII file. 

You can use a word processor to examine ASCTI files. If you see any- 
thing in the file other than numbers or typewriter characters, or if the 
cursor jumps around erratically on the screen, you do not have an ASCII 
file that SYSTAT can read. Some editors such as Microsoft Word and 
Word Perfect can display hidden markers (tabs, carriage returns, col- 
umn markers, page breaks, etc.) so that you can remove them. 


Errors and Following are some of the error messages encountered when reading 
error messages ASCII files. 


Empty file error 
Error: you are trying to read an empty or nonexistent file 


‘Make sure you spelled the file name correctly. 
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~ ASCII files 


Make sure the file is in the current folder. If it is not, either copy it to 
the current folder or specify the fully qualified file name (path plus full 
file name) in quotes. 


Make sure the file is a plain text file (ASCII text, non-document, etc.), 
not some other format. 


Long INPUT statement 


If your INPUT statement is too long for one line, end the first line with 
a comma and press Enter before the line wraps around on your screen 
(before column 80). Continue typing the statement on the next line. Do 
this for as many lines as you need. 


Data lost or in the wrong columns 


If SYSTAT places data incorrectly or data is lost when you read it, 
check the following: 


Make sure you correctly specify missing values in your data file. If you 
are using free-format input, enter missing numeric values as periods (.) 
and missing character values as a blank surrounded be quotation marks 
(" "). Ifyou are using fixed-format input, you may leave missing values 
as blank spaces. 


If you are using free-format input to read a file that does not have the 
same number of values in every record, add the backslash (\) to the end 
of your INPUT statement (see above). 


Unexpected data errar 


Error: unexpected data for case # at end of this line: 
last data entered before error was encountered 
This may result from character data in numeric field or vice versa 


Make sure the ASCII file includes no field headings or variable labels. 


Make sure variable types match data types. Do not put character data 
under a numeric variable or vice-versa. 


Make sure you correctly specify missing values. With free-format input, 


put a period where there are missing numbers. In fixed-format input, 


you may leave missing values as blank spaces. 
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Entering data 


If you are using free-format input to read a file that does not have the 
same number of values in every record, add the backslash (\) to the end 
of your INPUT statement. 


If you are using fixed-format input, make sure you specify the variable 
types in the format section correctly ($n for character data, #n for nu- 
meric). Also, make sure the format correctly tells SYSTAT where the 
values for each variable are located. 


Non-ASCll character warning 


If you try to read an ASCII file and receive the warning 


***Warning*** Non-ASCII character on case # will be converted 
to blank 


check for non-printing characters in your file. Such characters include 
control characters, tab markers, margin and page-break indicators, etc. 


Nonmatching number of variables error 


If, with fixed-format input, you receive the error message 


Error: number of format items does not match number of variables 


the number of variables defined in the format of your INPUT statement 
does not match the number of variables named in the variable list. 


Input past end of record error 


If, with fixed-format input, you receive the error message 


Error: input past end of record. Check your format. 
the format of your INPUT statement tells SYSTAT to read out along 


your ASCTI file records further than allowed. This message should 
rarely occur. 
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3 Printing data file 
i Ott Tile 


Command reference 


Printing data 
LIST 
3.1 Listing the first ten cases 
3.2 Listing many variables 
PRINT 
3.3 Printing several variables 
OUTPUT 
3.4 Printing three variables 


BH hA BR A OAK ti 615 


ree et 
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Printing data files 3 





Overview The simplest tasks in SYSTAT’S DATA facilities are viewing and print- 
ing files. You can accomplish both tasks quite easily using the regular 
Data Editor (see the Getting Started manual), but the tasks introduce you 
to some basic DATA commands that you will need for more compli- 
cated tasks, 





LN, 
Pe 
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Command reference 


i es et: 


Lists the contents of the file named by the 
USE statement. Varlist is an optional list 
of variables for viewing only a portion of 
the file. 


- eee, 


OUTPUT */@/ filename 


Redirects output. Use * to send output to 
the screen (the default), @ to send to the 
printer, or specify a filename to save to a 
file, 


—————— 


PRINT varlist | ‘string’ 


Displays the values of the variables listed 
in varlist, or displays the character string 
you specify. Varlist may include numeric 
or character variables. See Chapter 7 for 
information on using the character string 
argument. 


—_— eee 


Applies subsequent commands to the first 
n Cases in the file. When you first use a 
file, the default for REPEAT is the number 
of cases in the file. Otherwise, the default 
nis 0 if you type only REPEAT. 


ee rer a yy 


Sets a DATA procedure in motion. HOT. 


a 


USE filename [varlist] 


Retrieves the SYSTAT data file filename. 
Varlist is an optional list of variables that 
you can use if you only want to work 
with some of the variables in filename. 
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Printing data 


LIST 











Use the LIST command to view a file as follows: 


USE filename 
LIST [varlist] 
RUN 


When you open a SYSTAT file with the USE command, SYSTAT dis- 
plays the names of all the variables in the file. You can then specify the . 
variables you want to see with LIST. If you do not specify any variables, 
SYSTAT lists the contents of all the variables in the file. Type RUN to 

have SYSTAT perform the commands you have specified. 


SYSTAT shows all the cases of the variables. If you only want to see 
some of the cases, you can specify a number of cases with REPEAT. 


REPEAT n 
When you want to list more than 5 variables, you may want to add: 
PAGE=WIDE 


This way, SYSTAT prints 9 variables per line. See Appendix I, 
Command reference, for further information. 


You can control the number of decimal places to be printed with the 


FORMAT command. See Appendix I, Command reference, for fur- 
ther information. 
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3.1 


Listing the first 


ten cases 


3.2 


Listing many 


variables 


40 


To list the first ten cases of the variables SPIRITS, WINE, and BEER 


from the SYSTAT file USDATA, enter: 
USE USDATA 


SYSTAT responds by listing the variables in the file: 


SYSTAT file variables available to you are: 


STATES REGIONS REGION DIVISIONS 
LANDAREA POP85 ACCIDENT CARDIO 
PULMONAR PNEU_FLU DIABETES LIVER 
HOSPITAL MARRIAGE DIVORCE FDSTAMPS 

TCHRSAL HSGRAD AVGPAY TOTALSLE 

MRCHDSE FOOODSTRS AUTODLRS GASSTATN 

FURNITUR EATNDRNK DRUGSTRE ALLSALES 

VIOLRATE PROPERTY PROPRATE PRISONER 
WINE BEER TAXES 





Printing data 


DIVISION 
CANCER 
DOCTORS 
TEACHERS 
BLDOGMTRL 
APPAREL 
YIOLENT 
SPIRITS 


Now use REPEAT to specify ten cases, LIST to name the three vari- 


ables you want to see, and RUN to start: 


REPEAT 10 
LIST SPIRITS, WINE, BEER 
RUN 

SPIRITS WINE BEER 
Case 1 2.680 2.710 30 
Case 2 6.180 5.000 48 
Case 3 3.040 4.120 35 
Case 4 3.210 4.250 31 
Case 5 2.700 4.370 32 
Case 6 ° 3.170 4.180 26 
Case- 7 2.720 4.140 27 
Case 8 2.900 4.600 28 
Case 9 1.740 1.840 33 
Case 10 1.600 2.090 33 


. 160 
. 100 
.660 
.890 
.820 
. 580 
. 580 
.610 
.010 
. 980 


10 cases and 43 variables processed. 
No SYSTAT file created. 


If you LIST more variables than SYSTAT can fit on one line, the cases 


continue or “wrap around” on subsequent lines. For example, the fol- 
lowing LIST command does not specify any variables. Therefore, 


SYSTAT lists all the variables in the file. 
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Printing data Printing data files 
REPEAT 3 
LIST 
RUN 
The first three cases of USDATA look like this: 
STATES REGIONS REGION DIVISIONS DIVISION 
LANDAREA POPBS ACCIDENT CARDIO CANCER 
PULMONAR PNEU_FLU DIABETES LIVER DOCTORS 
HOSPITAL MARRIAGE DIVORCE FOSTAMPS TEACHERS 
TCHRSAL HSGRAD AYGPAY TOTALSLE BLDOGMTRL 
MRCHDSE FOODSTRS AUTOOLRS GASSTATN APPAREL 
FURNITUR EATNORNK DRUGSTRE ALLSALES VIOLENT 
VIOLRATE PROPERTY PROPRATE PRISONER SPIRITS 
WINE BEER TAXES 
Case 1 ME Northeast 1.000 New England 1.000 
ee Case 1 33265.000 1164.000 37.700 466.200 213.800 
-ewlliboe Case 1 33.600 21.100 15.600 14.500 1773.000 
Case 1 47,000 12.600 §.900 120.000 12.300 
AS Case 1 17328 .000 14.600 14130.000 §169.000 304.000 
baal Case 1 462.000 1275.000 914.000 387.000 210.000 
: Case 1 133.000 411.000 161.000 5332 .000 1.800 
_ Case 1 160.000 40.400 3522.000 1025.000 2.680 
a Case 1 2.710 30.160 0.630 
wee Case 2 NH Northeast 1.000 New England 1.000 
' Case 2 9279.000 998.000 35,900 395.900 182.200 
Case 2 29.600 20.100 17.600 10.400 1612.000 
Case 2 34.000 11.100 4.600 41.000 9,700 
Case 2 17376.000 11.500 15541.000 §239.000 332.000 
Case 2 525.000 1252.000 949.000 398.000 244.000 
Case 2 192.000 418.000 133.000 5354 .000 1.200 
Case 2 125.000 31.000 3231.000 527.000 6.180 
Case 2 5.000 * 48.100 1.030 : 
Case 3 VT WNortheast 1.000 New England 1.000 
Case 3 9614.000 535.000 41.300 433.100 198.100 
Case 3 33.100 24.000 15.600 13.100 1154.000 
Case 3 19.000 5.500 2.500 50.000 6.200 
Case 3 17931.000 6.000  14643.000 2529 .000 173.000 
Case 3 171.000 596.000 464.000 217.000 107.000 
Case 3 76.000 217.000 68.000 2601 ..000 0.700 
Case 3 133.000 21.000 4000.000 536.000 3.040 
Case 3 4.120 35.660 0.450 
4 cases and 43 variables processed. 


No SYSTAT file created. 


The individual cases are coo long to fit on one line, so each case occu- 
pies nine lines. Consider PAGE=WIDE go list 9 variables on a line 


instead of 5. 
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Printing data files Printing Cata 


PRINT 


3:3 
Printing several 
variables 


OUTPUT 
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The PRINT command prints values of variables you specify. It is similar 
to LIST, except case numbers and variable names are not printed. 
Examples: 


PRINT AGE, SEX$ 
PRINT A, B, C, NAME$ 


If you do not include an argument after PRINT, a blank line is printed. 
Compare the following with the output produced in Example 3.1. 


Notice how the variable names and case numbers are missing. With 
PAGE=WIDE, you can display up to 10 variables on a line. 


USE USDATA 

REPEAT 3 

PRINT SPIRITS, WINE, BEER 

RUN 
2.680 2.710 30.160 
6.180 5.000 48.100 
3.040 4.120 35.660 


SYSTAT prints numeric and character values in 12 column, right-justi- 
fied fields. Blanks pad the left side of each field. You can use FORMAT 
to set the number of decimal places shown for numeric values. 


(The FORMAT command is the command equivalent of the “Decimals 
to show” option in Formats... from the Editor menu.) 


To obtain a hard copy from your printer, include the OUTPUT @ 
command with LIST or PRINT. 


The command OUTPUT * redirects output back to the screen when 
your computer is done printing the SYSTAT file. 
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cases 
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Printing data files 


This example prints the first three cases of USDATA. 


USE USDATA 
OUTPUT @ 
REPEAT 3 
LIST 

RUN 

OUTPUT * 


When you type RUN, SYSTAT sends the same output shown in the 
previous example to your printer while also displaying it on your screen. 
Notice that OUTPUT * is used at the end to redirect output to the 


screen. 
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Saving data in text files 


Command reference 


Basics 

USE and PUT | 
Tab-delimited files 
Decimal places 

OUTPUT and PRINT 
4.1 Putting data into a text file 
4.2 Saving selected cases 
4.3 Saving selected variables 
4.4 Changing a variable’s type 
4.5 Unpacking records 
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Saving data in text files 4 





- Overview 





This chapter shows how to convert SYSTAT data files to plain text files 
in various formats. 


Note that File/Save as... lets you save data files to text from the Data 
Editor. You wouldn’t need to use DATA unless you need to change a 
variable’s type from numeric to character (see Example 4.3) or print val- 
ues with special formats as part of a more general program. 
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Command reference 


Pt 


MAC filename Saves data in a SYSTAT data file into a 
plain text (ASCII) file with tab delimiters. 





OUTPUT * | @/ filename — Redirects output. Use * to send output to 
the screen (the default), @ to send to the 


printer, or specify a filename to save to a 
file. 


a eeeeeeeseseSeNSNeEe 


PUT filename Saves data in a SYSTAT data file into a 


plain text (ASCII) file with comma 
delimiters. 
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USE and PUT 


Tab-delimited 
files 


Decimal places 


eae a 





The general strategy is to open a SYSTAT data file with USE, specify 
an output ASCII file with PUT, and then type RUN. 


USE datafile 
PUT outputfile 
RUN 


PUT saves your data in a text file that has up to 12 columns, with 
each column separated by commas. Character values (strings) are 
surrounded by double quotation marks ("). 


You can also save text files with columns separated by tabs. Use the 
command MAC in place of PUT. MAC works exactly the same as PUT 
except that it uses tab delimiters. 


Numeric values default to 3 decimal places. You can change this by 
placing a FORMAT command before RUN. For example, this program 
will write 7 digits after the decimal: 


USE datafile 
PUT outputfile 
FORMAT=7 

RUN 


You can also use OUTPUT and PRINT to save data to an ASCII file: 


OUTPUT redirects PRINT so that it saves the values of the variables 
you specify in a file, rather than displaying them on the window. 


USE datafile 
PRINT varlist 
OUTPUT outputfile 
RUN 


Varlist specifies the variable(s) you want to include in the ASCI file. 
After you are done, remember to direct output back to the screen again 
with OUTPUT +. 
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Saving data in text files 


4.1 Putting data 
into a text file 


4.2 
Saving selected 
cases 


Basics 


The following commands convert the SYSTAT file USDATA to a text 
file . 


USE USDATA 
PUT TEXTFILE 
RUN 


You can use a word processor to view TEXTFILE. The first case in the 
file appears as follows: 


"ME ", "Northeast aie 1.000, "New England °, 
1.000, 33265.000, 1164.000, 37.700, 466.200, 
213.800, 33.600, 21.100, 15.600, 14.500, 
1773.000, 47.000, 12.600, 5.900, 120.000, 
12.300, 17328.000, 14.600, 14130.000, 5169.000, 
304.000, 462.000, 1275.000, 914.000, 387.000, 
210.000, 133.000, 411.000, 161.000, 332.000, 
1.800, 160.000, 40.400, 3522.000, 1025.000, 
2.680, 2.710, 30.160, 0.630 


This is one record, with a carriage return appearing only at the end. 
The record “wraps around” here for display purposes. Notice that 
commas are used as data separators and character variables are sur- 
rounded by quotes. This is not true if you use PRINT, as in the follow- 
ing examples. 


You can save only certain cases in a text file by using IF ... THEN 
PRINT... Here is an example 


USE USDATA 
OUTPUT TEXTFILE | 
IF REGIONS="Northeast” THEN PRINT BEER WINE 
RUN 

OUTPUT « 


Other possible selections are statements like: 


IF CASE=1 OR CASE=3 OR CASE=34 THEN PRINT BEER WINE 
IF SPIRITS>2 AND SPIRITS<3 THEN PRINT BEER WINE 
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Basics Saving data in text files 


4.3 You can save only certain variables in a text file by listing variable names 
Saving selected in the USE command. 
variables 


USE USDATA (REGION DIVISION BEER WINE) 
PUT TEXTFILE 


RUN 
4.4 To change a character variable to a numeric variable or vice versa, you 
Changing a must create a new variable of the correct type. You cannot do this with 
variable’s type the Data Editor. If the numeric variable you want to change has few val- 


ues, use SYSTAT transformation statements to create the new variable. 
You can most efficiently create a character variable from a categorical 
numeric variable with the LABEL command. 


If the variable has many values, however, create a text file containing the 
variable you want to change by using OUTPUT and PRINT. Then, 
read the variable back into the original file in the desired format using 
the GET and INPUT commands. Then you can drop the original vari- 
able from the file, if you wish. 


This example uses the variables BEER, SPIRITS, and WINE from 
USDATA. It outputs the values of BEER to a text file and then reads 
those values back into the character variable BEERS. Thus, it changes a 
numeric to a character variable. The results are saved in the file FINAL. 


This first step creates the text file TEXT that contains the values for 
BEER. The command OUTPUT TEXT directs subsequent output to 
the file TEXT. The OUTPUT * command causes SYSTAT to close 


USE USDATA 
OUTPUT TEXT 
PRINT BEER 
RUN 


OUTPUT * 
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4.5 Unpacking 
records 


50 


This step reads the variables SPIRITS, WINE, and BEER from 
USDATA, drops the variable BEER, and then reads the values from 
TEXT into BEERS. 


USE USDATA(SPIRITS WINE BEER) 
SAVE FINAL 

GET TEXT 

INPUT BEER$ 

DROP BEER 

RUN 


A listing of the first ten cases of FINAL produces: 


SPIRITS WINE BEER$ 
Case 1 2.680 2.710 30.160 
Case 2 6.180 5.000 48.100 
Case 3 3.040 4.120 35.660 
Case 4 3.210 4.250 31.890 
Case 5 2.700 4.370 32.820 
Case 6 3.170 4.180 26.580 
Case 7 2.720 4.140 27.580 
Case 8 2.900 4.600 28.610 
Case 9 1.740 1.840 33.010 
Case 10 1.600 2.090 33.980 


Do not confuse the values in the variable BEER$ with numeric values. 
They are now numerals, not numbers, which can be used to label cases as 
discrete categories. If you Uy to compute statistics or transformations on 
BEERS, you will get an error message. 


This example shows how to transform several repeated measures on a 
single record into one measure per record. We use subscripted variable 
names for convenience. 


SAVE TRIAL 
INPUT X(1-5), SEX$ 
RUN 


10 20 30 40 50 Male 
11 21 31 41 52 Female 


The file contains two records: 
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X(1) X(2) X(3) X(4) X(5) 
SEX$ 
Case l 10.000 20.000 30.000 40.000 50.000 
Male 
Case l 11.000 21.000 31.000 41.000 51.000 
Female 


The following commands make a new file with 10 records. First, create 
a temporary TEXT file with 10 records, each containing one data value 
plus a sequence number and label: 


USE TRIAL 
OUTPUT TEMP 
FOR I=1 TO 5 
PRINT X(1I),1,SEX$ 
NEXT 
RUN 


Then, read the data from the text file TEMP into a SYSTAT file 


NEWFILE: 

OUTPUT * 

. GET TEMP 

ie INPUT X,1,SEX$ 
7% SAVE NEWTRIAL 
oie RUN ‘ 

= A listing of NEWTRIAL produces: 

a x I SEXS 
a CASE 10.000 1.000 Male 
: CASE = 2 20.000 2.000 Male 
CASE = 3 30.000 3.000 Male 

CASE = 4 40.000 4.000 Male 
CASE 45 50.000 5.000 Male 
CASE 6 11.000 1.000 Female 
CASE = 7 21.000 2.000 Female 
CASE =. 88 31.000 3.000 Female 
CASE = 9 41.000 4.000 Female 
CASE 10 51.000 5.000 Female 


See the next chapter, Rearranging and combining files, for more so- 
phisticated data manipulation of this sort. 
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Rearranging and combining files 3 





Overview In this chapter, you will learn how to do various file manipulation tasks 
including merging files, dropping variables, selecting subsets of vari- 
ables, deleting cases, and rearranging variables. 
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Command reference 





APPEND file? file2 Creates a new file (named by a SAVE 
‘ command) by appending cases of file2 at 
the bottom after cases of file?. Both files 
must contain the same variables, in the 
same order, but they can have different 
numbers of cases. You must use SAVE 
before APPEND, which is HOT. 





DELETE Prevents the current case from being writ- 
ten to the SAVE file. You can use DELETE 
only with an IF... THEN command. 





DROP varlist Prevents the variables given by varlist 
from being written to the file named by 
SAVE. 





TRANSPOSE Transposes a data file by turning rows 
(cases) into columns (variables) and vice 
versa. You can only transpose files with 
numeric data. TRANSPOSE can handle a 
maximum of 99 cases (before transpos- 
ing). 





USE file? [varlist] file2 Brings both file? and file2 into the active 
[varlis¢] workspace. You can merge these files 
into a single third file. Use the optional 
varlists if you want to merge only por- 
tions of the file(s). 


a 2 
7 se ae 


USE filename (varlist) Retrieves the specified variables from the 
SYSTAT file filename. 
34 © 1989, SYSTAT, Inc. 
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You can cut and paste in the Data Editor to select subsets and rearrange 
rows and columns of a SYSTAT file. More complex operations that 
require programming, however, need to be done in DATA. 


To produce a file that contains a subset of variables from an existing file, 
you can either drop variables with the DROP command or select a sub- 
set of variables with the USE command. 


DROP takes a varlist argument: follow the word DROP with the 
name(s) of the variable(s) you want to omit. Examples are: 


DROP DEAD 
DROP OUT 


Imagine a file called OLDFILE that contains the variables WANTED 
and UNWANTED. Both of the following procedures create a file 
NEWFILE that contains only the variable WANTED: 


USE OLDFILE USE OLDFILE(WANTED) 
DROP UNWANTED SAVE NEWFILE 

SAVE NEWFILE RUN 

RUN 


The USE command extracts a small number of variables from a large 
file most easily. To delete a‘small number of variables from a large file, 
the DROP command is more convenient. 


We illustrate both methods below. 


Here, we drop the variables A and B from a file DATASET that 


contains: 


A B C D 
Case 1 1.000 2.000 3.000 4.000 
Case 2 5.000 6.000 7.000 8.000 
Case 3 9.000 1.000 2.000 3.000 
Case 4 4.000 5.000 6.000 7.000 
Case 5 8.000 9.000 1.000 2.000 
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5.2 
Extracting three 
variables 


56 


The following program saves only C and D into NEWDATA: 
USE DATASET 

SAVE NEWDATA 

DROP A B 

RUN . 

To see the contents of NEWDATA, enter: 


USE NEWDATA 


LIST 
RUN 
Cc D 

Case 1 3.000 4.000 
Case 2 7.000 8.000 
Case 3 2.000 3.000 
Case 4 6.000 7.000 
Case 5 1.000 2.000 


Note: after you list a variable in a DROP command, you cannot refer to 
it in any other DATA command. For this reason you should usually 
make DROP the last DATA command you issue before RUN. For ex- 
ample, the following produces an error message about using an unini- 
tialized variable. 


DROP X 

LET X2=X%2 
RUN 
You can select subsets of variables with the USE command. Here, we 
save the variables SPIRITS, WINE, and BEER from the data set 
USDATA into a file called LIQUOR. 


USE USDATA(SPIRITS WINE BEER) 
SAVE LIQUOR 
RUN 


SYSTAT responds by listing only the variables you specified: 


SYSTAT file variables available to you are: 
SPIRITS WINE BEER 
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When you type RUN, SYSTAT responds: 


50 cases and 3 variables processed. 
SYSTAT file created. 


Now enter: 
USE LIQUOR 


When you issue this USE command, SYSTAT shows that the variables 
you selected previously are the only ones in the file: 


SYSTAT file variables available to you are: 
SPIRITS WINE BEER 


You can reorder variables in a SYSTAT file with the USE command by 
specifying the variables in their new order in parentheses. 


This example extracts the variables SPIRITS, WINE, and BEER from 
USDATA and rearranges them in the new file ALCOHOL. 


USE USDATA(BEER SPIRITS WINE) 
SAVE ALCOHOL 
RUN 


Now enter the following: 


USE ALCOHOL 


SYSTAT file variables available to you are: 
BEER SPIRITS WINE 


The variables are now in the order we specified on the USE command, 
not in their previous USDATA order (SPIRITS, WINE, BEER). 


The DELETE command prevents cases from being saved into a file. It 
can be selectively programmed if used with the IF... THEN starement. 


Examples: = 


IF CASE=10 THEN DELETE 


IF GROUP>3 THEN DELETE 
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SYSTAT prevents the cases you identify from being saved into a new 
file. You cannot remove cases from the original file. Rather, you must 
create a new file that contains all the data in the original file except for 
the cases that you tell SYSTAT to delete. 


5.4 | The following program causes SYSTAT to create a file NEWDATA 
Deleting cases that contains only the last 5 cases of USDATA. We list the variable 
REGION to show the cases SYSTAT has saved. 


USE USDATA 

SAVE NEWDATA 

IF CASE<=45 THEN DELETE 
LIST STATE$ 


RUN 

STATES 
Case 46 WA 
Case 47 OR 
Case 48 CA 
Case 49 AK 
Case 50 HI 


5 cases and 43 variables processed. 
SYSTAT file created. 


These are the cases SYSTAT has saved. In the new file, it numbers 
them cases 1-5: 





REGION 

Case 1 : WA ; 
Case 2 OR 
Case 3 CA 
Case 4 AK 
Case 5 HI 

5.5 The following program saves into NEWFILE only those cases in 

Saving certain USDATA where DIVISION equals 2 or 4. 

cases 
USE USDATA 
SAVE NEWFILE 
IF DIVISION<>2 AND .DIVISION<>4 THEN DELETE 
LIST DIVISION 
RUN 
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Rearranging files Rearranging and combining files 
DIVISION 
Case 7 2.000 
Case 8 2.000 
Case 9 2.000 
Case 15 4.000 
Case 16 4.000 
Case 17 4.000 
Case 18 4.000 
Case 19 4.000 
Case 20 4.000 
Case 21 4.000 


10 cases and 43 variables processed. 
SYSTAT file created. 


Transposing a The TRANSPOSE command transposes the cases and variables of a 
file file. The command has no arguments. Just type it before you RUN. You 
must SAVE to a new file, however, to retain the transposed file. 


You cannot transpose a file that contains a character variable unless the 
character variable is named LABELS and is the first variable in the file. 
In this instance TRANSPOSE uses the values in LABELS to label the 


variables in the transposed file. 


You cannot transpose a file with more than 99 cases. Transposing a 
symmetric matrix (e.g. correlations) is unnecessary since the transpose a 
symmetric matrix is the original matrix. 


The transposed file contains an additional character variable called 
LABELS that contains the old variable names. The names for the 
columns in the new transposed file are COL(01)-COL(m) where x is the 
number of cases you had in the matrix to be transposed. 
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RANK, SORT, STANDARDIZE, and TRANSPOSE cannot be used : 
jointly in one step (one RUN). If you want to STANDARDIZE and | 
then TRANSPOSE, for example, do something like the following: 


USE FILE1 
SAVE FILE2 
STANDARDIZE 
RUN 


USE. .PELEZ 

SAVE FILE3 
TRANSPOSE 

RUN 


You can transpose a transposed file. This standardizes MYFILE by 
rows: 


USE MYFILE 
SAVE TFILE 
TRANSPOSE 
RUN 


USE TFILE 
SAVE SFILE 
STANDARDIZE 
RUN 


USE SFILE 
SAVE MYFILE 
TRANSPOSE 
RUN 


MYFILE is standardized within rows but otherwise is the same. 
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Combining files 





There are two ways to combine files: horizontally (side-by-side, concate- 
nating different variables for the same cases) and vertically (end-to-end, 
concatenating different cases for the same variables). 


The USE command performs horizontal concatenation: 


USE AB 
SAVE C 
RUN 


dine The APPEND command performs vertical concatenation. 





SAVE C 
APPEND A B 


Saat A 


The APPEND command does not require RUN to execute. You can 
merge or append only two files at one ume. If you have more than two 


—> 





3 - files, merge them successively, two at a time, until they are all part of 
) one file. 
© 1989, SYSTAT, Inc. 61 





nar + * . ee es re 

Rees veces) fire <3 rr Et ¥ : 
SE een ee ee i 7 ¥ mer © . 
ind one — arg oan yy Be ace 


~ 








Rearranging and combining files Combining files 


Merging 
horizontally 


Merging by key 
variables 
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The USE command allows you to join two SYSTAT files horizontally 
(side-by-side). SYSTAT produces a file containing the variables in the 
first file followed by the unique variables in the second file. If you do 
not specify any index variable(s), SYSTAT matches the cases from the 
two files in order, matching the first case from each file, then the second 
case from each file, and so on until the last case. If one file has more 
observations than the other, SYSTAT assigns missing values to the vari- 
ables from the shorter file for all the unmatched observations. If the 
same variable name appears in both files, SYSTAT uses the values from 
the second file, thus overwriting the values on the same variable in the 


first file. 


The total number of variables in the two files cannot exceed the number 


allowed in a single SYSTAT file. 
You can subset and reorder variables when you merge, e.g.: 


USE MOE(X Y Z) JOE(C A B) 
USE MOE(Z) JOE 
USE MOE JOE(A B) 


The first example merges two files, extracting variables X, Y, and Z 
from MOE and variables C, A, and B from JOE. The second selects 
variable Z from MOE and all the variables in JOE. The third selects all 
the variables from MOE and A and B from JOE. 


You can merge files using a key (index) variable (or several key vari- 
ables). You must sort both files on the key variable(s) before merging. 


SYSTAT matches the cases that have the same values for the key vari- 
able(s) and merges them in a case in the new file. If there are values for 
the key variable(s) in one file and not the other, the merged file records 
missing values for the variable whose file did not have value. 


USE A B/KEY 
SAVE C 
RUN 
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5.6 
MERGE example 
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Rearranging and combining files 
File A File B File C 
KEY X KEY Y KEY X Y 
1 10 1 100 1 10 ~=—-:100 
2 20 3 300 2 20 
3 300 


One key variable may have many occurrences of a value that appears 
only once in the other file. For this, SYSTAT replicates the values from 
the other file. For example: 


File KIDS File MOMS File PAIRS 

FAMILY X FAMILY Y FAMILY X Y 

1 10 1 100 l 10 100 
} il 1 1] 100 
1 12 l 12 100 


This example demonstrates merging two files with USE. One file, 
NAME, contains the names of men who have been presidential candi- 
dates in the variable NAMES. The second file, PARTY, contains their 
party affiliations in the variable PARTYS: 


NAME$ PARTY$S 


Eisenhower Republican 
Stevenson Democrat 


Kennedy Democrat 
Goldwater Republican 
Johnson Democrat 


Humphrey Democrat 
McGovern Democrat 


Nixon Republican 
Ford Republican 
Carter Democrat * 
Reagan Republican 
Bush Republican 


A one-to-one correspondence exists between the cases in the two files. 
The first case from the NAME file corresponds with the first case in the 
PARTY file. Now we can merge these files into a file called 
CANDIDAT. 


USE NAME PARTY 
SAVE CANDIDAT 
LIST 

RUN 
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Rearranging and combining files Combining files 


When you issue the USE command, SYSTAT lists the variables from 
both files: 


SYSTAT file variables available to you are: 
NAME$ PARTY$ 


When you type RUN, SYSTAT responds: 


NAME$ PARTY$ 
Case 1 Eisenhower Republican 
Case 2 Stevenson Democrat 
Case 3 Kennedy Democrat 
Case 4 Goldwater Republican 
Case 5 Johnson Democrat 
Case 6 Humphrey Democrat 
Case 7 McGovern Democrat 
Case 8 Nixon Republican 
Case 9 Ford Republican 
Case 10 Carter Democrat 
Case 1 Reagan Republican 
Case 12 Bush Republican 


12 cases and 2 variables processed. 
SYSTAT file created. 


3.7 This example merges the files ELECTION and CANDIDAT by the 
Merging with a variable NAME$. CANDIDAT was created in Example 5.6. Now we 
key variable can make ELECTION. 


SAVE ELECTION 

INPUT NAME$, LOSER$, YEAR 
RUN . 
Eisenhower Stevenson 1952. 
Eisenhower Stevenson 1956 


Kennedy Nixon 1960 
Johnson Goldwater 1964 
Nixon Humphrey 1968 
Nixon McGovern 1972 
Carter Ford 1976 
Reagan Carter 1980 | 
Reagan Mondale 1984 
Bush Dukakis 1988 


~~?! 
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Rearranging and combining files 


YEAR designates the year of the presidential election, NAMES the 
candidate who won that year, and LOSERS the candidate who lost. 


First, sort CANDIDAT and ELECTION: 


USE CANDIDAT 
SAVE CANDSORT 
SORT NAME$ 
RUN 

USE ELECTION 
SAVE ELECSORT 
SORT NAME$ 
RUN 


To merge by NAMES, USE both sorted files in a single command and 
specify the key variable, NAMES. 


USE ELECSORT CANDSORT/NAME$ 
SAVE MERGFILE 
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RUN 
A LIST of MERGFILE reads: 
NAME$ LOSERS YEAR PARTY$ 
Case 1 Bush Dukakis 1988.000 Republican 
Case 2 Carter Ford 1976.000 Democrat 
Case 3 Eisenhower Stevenson 1952.000 Republican 
Case 4 Eisenhower Stevenson 1956.000 Republican 
Case 5 Ford , Republican 
Case 6 Goldwater Republican 
Case 4 Humphrey ‘ Democrat 
Case 8 Johnson Goldwater 1964 .000 Democrat 
Case 9 Kennedy Nixon 1960.000 Democrat 
Case 10 McGovern : Democrat 
Case 11 Nixon Humphrey 1968.000 Republican 
Case 12 Nixon McGovern 1972.000 Republican 
Case 13 Reagan Carter 1980.000 Republican 
Case 14 Reagan Mondale 1984.000 Republican 
Case 15 Stevenson Democrat 
65 
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Appending 
vertically 
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The missing values occur where entries in CANDIDAT have no 
matching value for NAMES in ELECTION (e.g. Ford, Goldwater, 
Humphrey, etc. did not win). SYSTAT sets to missing the variables that 
came from ELECTION (LOSER$ and YEAR) for these cases. The 
ELECTION file has three names that have more than one entry, 
Eisenhower, Nixon, and Reagan. In these cases, SYSTAT replicates the 
corresponding values from CANDIDAT. 


Note: if you subset variables while merging, you must include the key 
variable(s) in both subsets. For example, the first command below 
works, and the second command does not. 


USE INDOOR(TIME,LOC,CO2) OUTDOOR(TIME,LOC,NOX)/TIME,LOC 
USE INDOOR(TIME,LOC,CO2) OUTDOOR(NOX)/TIME,LOC 


APPEND joins two files vertically. The files must have the same vari- 
ables in the same order. SYSTAT places cases from the second file you 
name below those from the first. 


Examples are: 


SAVE FINANCE 
APPEND SALES UPDATES 


SAVE PEOPLE 
APPEND MALES FEMALES 


APPEND is HOT, like RUN. SYSTAT executes APPEND immedi- 
ately. To save the appended files permanently, issue a SAVE command 
before APPEND. SAVE is the only prior command that affects 
APPEND. 
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Combining files Rearranging and combining files 


5.8 Here are two SYSTAT files, named MEN and WOMEN: 
APPEND 
example MEN WOMEN 

SEX$ AGE SEX$ AGE 

MALE 18.000 FEMALE 23.000 

MALE 35.000 FEMALE 40.000 

MALE 24.000 FEMALE 40.000 

MALE 20.000 FEMALE 31.000 


To append them into a file named SEXES, type: 


SAVE SEXES 
APPEND WOMEN MEN 


8 cases and 2 variables processed. 
SYSTAT file created. 


The new file contains: 


nee SEXS AGE 





a Case 1 FEMALE 23.000 

ee Case 2 FEMALE 40.000 

bie Case 3 FEMALE 26.000 

Ya Case 4 FEMALE 31.000 
des, Case 5 MALE 18.000 
ae Case 6 MALE 35.000 
ae Case 7 MALE 24.000 

tam Case 8 MALE 20.000 

= 
| at SYSTAT placed the cases from WOMEN before those from MEN 
Be because we listed WOMEN first in the APPEND command. 





© 1989, SYSTAT, inc. 67 








6 Transforming variables 





Command reference 70 
Basics | 71 
How to transform - 71 
Operators, functions, and built-in variables 73 
Arithmetic operators 73 
Functions 73 
Relational operators 73 
Logical operators 74 
Multi-variable functions 74 
Distribution functions 74 
Built-in variables - 74 
Order of operations 74 
Missing values 75 
Statements 76 
Simple transformations using LET 76 
_ 6.1 Re-expression — 77 
6.2 Creating new variables 78 
6.3 Multiple LET statements 79 
iF... THEN LET 79 
6.4 Simple conditional transformation 79 
6.5 IF... THEN using logical OR 80 
6.6 IF... THEN using logical AND 81 
Recoding values using CODE 82 
6.7 Simple CODE 83 
6.8 Conditional CODE 84 
Creating character variables using LABEL 84 : 
6.9 Simple LABEL 85 
Lagging with LAG 86 
6.10 First order lag 87 


68 © 1989, SYSTAT, Inc. 





eo ere ae Rn ee a eat cei tice SS Ke oD) teats Seti, ocd a 
aE Ele ali OS STS nd MEER Eek SH Feat ICI ODA ET 





Transforming variables G 





Overview This chapter shows you how to perform simple transformations of your 
data. You can do many of the transformations using the Data Editor (see 
the Getting Started manual). For repetitive transformations or more 
complex programs, though, you need DATA, with the LET and 
IF... THEN commands. LET and IF... THEN work the same as 
Math... and Recode... from the Editor menu. | 


With DATA, you can also recode variables using CODE and create 

character equivalents of numeric variables using LABEL. All of these 

transformations can be used in more complex programs, which are dis- 
om cussed in the next chapter. 
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Command reference 


CODE varlist / 
oldi=newtl, 
old2=new2, ..., 
oldp=newp 


IF exprn THEN statement 


Recodes the variables fisted in varlist. For 
all the variables in varlist, any case with 
value old? is replaced with value new!. 
All occurrences of o/d2 are replaced with 
new2, etc. 


All variables in varlist must be the same 
type (character or numeric), and the o/dp 
and newp values must correspond to the 
variable type. Surround character strings 
with single or double quotation marks. 


Executes statement if the exprn evaluates 
as true. Exprn may be any valid expres- 
sion formed with numbers, variables, op- 
erators, and functions. Statement may be 
any valid command, including DELETE. 





LABEL varlist / 
old1=label1, 
old2=label2, ..., 
oldp=labelp 


Creates a character variable for each nu- 
meric variable in varlist. Varlist can con- 
tain numeric variables only. For each 
numeric variable, a character variable 
with the same name plus $ is created, 
with values as given by oldi=labeli. If any 
character variable already exists, its val- 
ues are replaced. 





LET var=exprn 
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Assigns the value of exprn to the variable 
var. You may use either a numeric or 
character variable. Character values must 
be surrounded by single or double quota- 
tion marks. 
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Basics 


i 


_ The statements in this chapter allow you to do things like: 
1) Re-express variables, e.g. 
LET WEIGHT — LOG(WEIGHT) 
2) Create new variables, e.g. 
LET GRADE = QUIZ1+QUIZ2 + 2*FINAL 
3) Create grouping variables, e.g. 


IF AGE>21 THEN LET AGE$ = ‘ADULT’ 


4) Create value labels for numeric codes, e.g. 
= LABEL SEX / 1l='Female',2='Male' 


5) Recode variable categories, e.g. 


CODE GAUL / 1=1,2=1,3=2 






How to You generally do transformations this way: 
transform 
USE filename - . 
transformations 
SAVE newfile 
RUN 


DATA does not execute transformations until you type RUN. This al- 
lows you to do many transformations at once. The file that you SAVE 
contains all the data in the original file plus any changes or additions 
that you make. In DATA this saved file automatically becomes the file 
in use. Ifyou want to go back to the original file, you must USE it again. 
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Transforming variables Basics 


To retain transformed data permanently, you must save your work to a 
new file. If you do not explicitly specify a SAVE file, DATA stores the 
results of a run in a temporary data file which then becomes the active 
file. It stays open as you move from module to module. SYSTAT erases 
this temporary file when you execute a USE, SAVE, or QUIT com- 
mand. 


You cannot save to the file in use. The Data Editor allows you to save to 
the same file that you opened, but DATA does zot. To save the results 

of transformations, you must name a new file with a SAVE command | 
sometime before you type RUN. | 


If you insist on saving to the original file, you can do so by executing the 
transformations, issuing a SAVE, and typing RUN again: 


USE DEJAVU 
transformations 
RUN 
SAVE DEJAVU 
RUN 
72 © 1989, SYSTAT, Inc. 
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Operators, functions, and built-in variables 





DATA makes use of operators and functions in expressions. An operator 
or function tells DATA to execute an operation or comparison on spec- 
ified values or variables. 


The operators and functions available are the same ones that are avail- 
able in the Data Editor. They are listed with brief explanations here. 
Note that logs to the base 10 are the same as LOG(X)/LOG(10). Sim- 
ilarly, logs to the base 2 are the same as LOG(X)/LOG(2). The trigono- 
metric functions (SIN, COS, etc.) are in radians. To use degrees, re- 
express like this: SIN(X*6.283/360) 


_ Arithmetic + addition 
. operators - subtraction 
¥ * multiplication 
mex. / division 
its x exponentiation 
4 - unary minus (negative) 
‘* Functions SQR square root 


LOG natural logarithm 
EXP exponential function 
ABS absolute value 


SIN sine 
COS cosine 
TAN _ tangent . 


ASN _ arcsine 

ACS _ arccosine 

ATN _ arctangent 

ATH _ arc hyperbolic tangent (Fisher’s z) 
INT integer truncation 

LAG _ lag (shift values down one case) 
LGM log gamma 





< less than 
= equal to 
> greater than 
<> not equal to 
<= less than or equal to 
=> greater than or equal to 


Relational 
; ©=Operators 
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Transforming variables 


Logical 
operators 


Multi-variable 
functions 


Distribution 
functions 


Built-in | 
variables 


Order of 
operations 
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Operators, functions, and built-in variables 


AND logical and 
OR logical or 
NOT logical not 


AVG _ mean of nonmissing values 

SUM — sum of nonmissing values 

MIN — minimum value of values 

MAX maximum value of values 

STD standard deviation of nonmissing values 
MIS number of missing values 


Distribution Cumulative Inverse Random data 
Uniform UCF(x) UIF(a) URN 

Normal ZCF(z) ZIF(a) ZRN 

T —  TCF(t,dFf) TIF(a, df) TRN(df) 

F FCF(F,df1,df2) FIF(c, df1 ,df2) FRN(df1,df2) 
Chi-square _ XCF(y?, df XIF(a.,df) XRN(df) 
Exponential ECF(x) EIF(a) ERN 

Gamma GCF(y,p) GIF(a,,p) GRN(p) 

Beta BCF(B,p,q) “BIF(a,p,q) BRN(p,q) 


The ZRN function, for example, generates random values from a nor- 
mal distribution (z scores). If you used one of these scores in the ZCF 
function, the result would be the area under the normal curve to the left 
of this score. If you entered this area in the ZIF function, the original z 
score would be returned. i 


Built-in variables allow you to index aspects of files: 


BOF _ beginning of file 

EOF end of file 

BOG _ beginning of group 

EOG _ end of group 

CASE case (observation) number 


BOG and EOG are defined only with an associated BY statement. See 
the Subgroup processing chapter for more information. 


Expressions are evaluated from left to right according to the precedence 
of operators. ‘That is, operators with higher precedence are evaluated 
before those with lower. Order of precedence from highest to lowest 
runs as follows: 
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Operators, functions, and built-in variables Transforming variables 
1) Expressions enclosed in parentheses () 
2) Exponentiation A 
3) Unary minus - 
4) Multiplication and division *, / 
5) Addition and subtraction +,- 
6) Relational operators =, <>, <, >, <=, >= 
7) Logical operators AND, OR 
8) Logical negation NOT 


Missing values — Missing character data values appear in SYSTAT commands and output 
as blanks. Missing numeric data values appear as periods (.). SYSTAT 
stores missing numeric data as negative values less than any number al- 
lowed in SYSTAT arithmetic, -1.0E36. This is because all logical com- 
parisons, including < and >, must evaluate to TRUE (1) or FALSE (0). 
Otherwise, statements such as GOTO would not work properly. 


—_ Logical comparisons with missing numeric data evaluate as follows: 


X<. is always false 

X=. is true if X is missing and false otherwise 
X>.__ is false if X is missing and true otherwise 
X <>. is false if X is missing and true otherwise 


All numeric arithmetic expressions involving missing values propagate 
missing values. For example, if you have two variables X and Y in a 
SYSTAT file and you sum them as follows: 


LET SUM =X + Y 


the value for the new variable SUM is missing for every case where ei- 
ther X or Y or both are missing. 


Y X SUM 
1 1 2 
2 ‘ 


3 


Only the mult-variable functions AVG, SUM, MIN, MAX, and STD 


automatically exclude missing values from computations. 





Statements 





Simple ___. The LET statement has the format: 
transformations sa 
using LET LET var=exprn 


SYSTAT assigns the value of the expression exprn to the variable named 
by var. Examples of LET statements are: 


LET X=2 

LET X=Y 

LET XmY+2 

LET RATE=CARDIO+CANCER 
LET LCARDIO=LOG(CARDIO) 


You can use LET to transform an existing variable or to create new 
variables. The expression on the right side of the equal sign can be any 
general mathematical expression on real numbers or characters. If an 
expression results in an illegal value (as would dividing by zero), 
SYSTAT sets the value for that case to missing. Also, missing values in 
arithmetic propagate missing values (e.g., 2 +. = .). 


Any variable that you use on the right of the equal sign must exist in the 
file. If it does not, SYSTAT displays the following error message: 


Warning: you are using an uninitialized variable. 
Its value will be set to missing. 


You can execute only one transformation with one LET command. For 
instance, SYSTAT does not allow statements like the following: 


LET X=2 AND Y=3 
LET X=2 AND LET Y=3 


SYSTAT does allow the following expression, however: 


LET OK = X=2 AND Y=3 
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Statements Transforming variables 


This statement shows two uses of the equal sign. The first is the equal 
sign designating assignment of the computed value to the variable OK. 
The second two are relational operators. The value of OK will be 1 for 
any case where X is 2 and Y is 3. Otherwise, its value will be 0. 


6.1 In the following program the values of the variable CARDIO in the file 
Re-expression USDATA are transformed into their natural logs. The first ten values of 
CARDIO are: 
CARDIO 
Case 1 466.200 
Case 2 395.900 
Case 3 433.100 
Case 4 460.600 
es Case 5 474.100 
Case 6 423.800 
Case 7 499.500 
Case 8 464.700 
Case 9 508.700 
Case 10 443.100 


To calculate the natural logs of these values, first USE the file, enter the 
LET statement, specify a SAVE file, and then set it all in motion with 


RUN: 

USE USDATA 

LET CARDIO = LOG(CARDIO) 
SAVE NEWDATA . 

RUN 


To list the first 10 transformed values of CARDIO, type: | 


REPEAT 10 
LIST CARDIO 
RUN 
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Case 
Case 
Case 
Case 
Case 
Case 
Case 
Case 
Case 
Case 


COW OHMS WMH 


| ee 


10 cases and 


CARDIO 


HAANAADAAANA 


~145 
981 
O71 
.133 
.161 
049 
.214 
.141 
232 
.094 


43 variables processed. 
No SYSTAT file created. 


Statements 


In the above example, we transformed the values of CARDIO to equal 
their natural logs. The original values were not saved. To save both the 
original and natural log values of CARDIO, issue a LET’ statement to 
create a new variable instead of transforming an existing one. 


NEW 

USE USDATA 
LET LCARDIO = LOG(CARDIO) 
SAVE NEWFILE 
RUN 


To list the first 10 cases of both CARDIO and LCARDIO, type: 


REPEAT 10 
LIST CARDIO, LCARDIO ~ 
RUN 

CARDIO 
Case 1 466.200 
Case 2 395.900 
Case 3 433.100 
Case 4 460.600 
Case 5 474.100 
Case 6 423.800 
Case 7 499.500 
Case 8 464.700 
Case 9 508 .700 
Case 10 443.100 


er nr gg eh 
Po ig 
t 
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10 cases and 


be 
BM. 


. + 
ae 
pare bank 3 


LCARDIO 


DRDAAAAAAAMNH 


.145 
981 
.071 
.133 
-161 
049 
214 
.141 
.232 
.094 


44 variables processed. 
No SYSTAT file created. 
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3 You can execute many transformation statements at once. The following 
Multiple LET program executes three transformations and saves the results into a file 
statements called NEWDATA: 
USE USDATA 


LET LCARDIO = LOG(CARDIO) 

LET RATE = CARDIO+CANCER 

LET ALCOHOL = SPIRITS+WINE+BEER 
SAVE NEWDATA 

RUN 


Note that each transformation is on its own line. You may include only 
one transformation, such as a LET statement, per line. 


IF...THEN LET = With the IF... THEN statement, you can execute conditional transfor- 
mations. The format for an IF... THEN statement is: 


IF condition THEN LET expression 
Examples are: 


IF X=99 THEN LET X=. 
IF CARDIO>400 AND 100<CANCER THEN LET RATE$='EXTREME' 


6.4 Simple The average value for the variable CARDIO is approximately 398 with a 
conditional standard deviation of 84. To indicate states where CARDIO is more 
transformation than one standard deviation greater than the mean, use a conditional 


transformation statement as follows: 


USE USDATA 
SAVE NEWFILE 
IF CARDIO>482 THEN LET RATE$='HIGH' 


RUN 


50 cases and 44 variables processed. 
SYSTAT file created. 
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Now list the first ten cases for the variables CARDIO and RATES in 
NEWFILE by typing: 


REPEAT 10 
LIST CARDIO, RATE$ 
RUN 
CARDIO RATES 
Case 1 466.200 
Case 2 395.900 
Case 3 433.100 
Case 4 460.600 
Case 5 474.100 
Case 6 423.800 
Case 7 499.500 HIGH 
Case 8 464.700 
Case 9 508.700 HIGH 
Case 10 443.100 


10 cases and 44 variables processed. 
No SYSTAT file created. 


Two of the first ten cases in the file meet the condition CARDIO>482; 
for these cases SYSTAT assigns the value “HIGH” to the new variable 
RATES. For all cases that do not meet the condition, SYSTAT sets 
RATES to blank, indicating a missing value. 


The average value for the variable CANCER in the data set USDATA 
is approximately 178, with a standard deviation of 33. To create a vari- 
able that indicates states whose values are more than one standard de- 


viation above the average for CARDIO or CANCER or both, use the 


- following transformation program: 


USE USDATA 
SAVE NEWFILE 
IF CARDIO>482 OR CANCER>211 THEN LET RATE$='HIGH' 


LIST CARDIO CANCER RATE$ 
RUN 


Here is a listing of the first 10 cases: 
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Statements Transforming variables 
CARDIO CANCER RATE$ 

Case 1 466.200 213.800 HIGH © 
Case 2 395.900 182.200 
Case 3 433.100 188.100 
Case 4 460.600 219.000 HIGH 
Case 5 474.100 231.500 HIGH 
Case 6 423 .800 205.100 
Case 7 499.500 209.900 HIGH 
Case 8 464.700 216.300 HIGH 
Case 9 508.700 223.600 HIGH 
Case 10 443.100 198.800 


Now, for every case where CARDIO is greater than 482 or CANCER is 
greater than 211 or both, SYSTAT assigns RATES a value of HIGH. 


6.6 To create a variable that indicates those cases where both CARDIO and 
IF... THEN using © CANCER are more than one standard deviation above the average, type 
logical AND the following: 

USE USDATA 


SAVE NEWFILE _ 
IF CARDIO>492 AND CANCER>211 THEN LET RATE$='HIGH' 
* LIST CARDIO, CANCER, RATE$ 
; RUN 


Here are the first 10 transformed cases: 


CARDIO CANCER RATE$ 








Case 1 466.200 213.800 
Case 2 395.900 182.200 
Case 3 433.100 188.100 
Case 4 460.600 219.000 
Case 5 474.100 231.500 
Case 6 423.800 205.100 
Case 7 499.500 209.900 
Case 8 464.700 216.300 
Case 9 508.700 223.600 HIGH 
Case 10 443.100 198.800 


Only one of the first ten cases has both CARDIO greater than 482 and 
CANCER greater than 211. 
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Recoding The CODE command provides a convenient way to recode or collapse : 
ae using categories of categorical values. | i 


Examples are: 


CODE REGION, DIVISION/1=2, 3=2, 4=1 
CODE STATE$/'NY'='East of Eden’, ‘IL’='Eden', 'CA'='West 
of Eden’ 


You can reference more than one variable in a CODE command. but 
you cannot mix numeric and character variables in the same CODE 


statement. To create a character variable from a numeric variable, use 


LABEL or a series of IF... THEN statements. 
You can also execute conditional codes as follows: 


IF exprn THEN FOR 
CODE specification 
NEXT 
See the next chapter for more information about this type of statement. 


The syntax of the CODE statement is: 


CODE varlist / oldvaluel=newvaluel, 
oldvalue2=newva/lue2,... 


You must be careful about the order in which you specify the recoding 
because SYSTAT recodes in the order that you list the changes. For in- 
stance, if you incorrectly enter: 


CODE REGION/1=2, 2=3, 3=1 
SYSTAT recodes REGION to 1 for all cases where REGION is 1, 2, or 
3. It first codes all 1’s to 2’s, then all 2’s to 3’s (including those that had 


just been changed from | to 2), and then all 3’s back to 1’s. Here’s a cor- 
rect way to change 1’s to 2’s, 2’s to 3’s, and 3’s to 1’s. 


CODE REGION/1=11, 2=12, 3—1, 11=2, 12=3 
In this way, all 1’s are changed to 11’s, then all 2’s to 12’s, then all 3’s to 
1’s, then 11’s to 2’s, and finally 12’s to 3’s. 
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Statements Transforming variables 


Note to SPSS users: the syntax of this statement resembles, but is not 
identical to, the SPSS RECODE statement. The value on the left of 
each equal sign is recoded into the value on the right. If you want to re- 
code several values into the same value, use several equal signs as in the 
example above. To recode continuous variables into categories, use a 
series of IF... THEN statements as shown at the end of the Tables 
chapter. 


6.7 In the data set USDATA, the first 14 cases contain states in divisions 1, 
Simple CODE 2, and 3. The following program copies the division values into the vari- 
able DIVISN2 and changes these values so that 1=3, 2=1, and 3=2. 


USE USDATA 
SAVE NEWFILE 
REPEAT 14 
LET DIVISN2=DIVISION 
CODE DIVISN2/1=11, 2=12,3=2, 1193, 12=1 





RUN 
aaa The values for the first 14 cases of DIVISION and DIVISN2 in the 
re data set NEWFILE are: 
ne, 
ee DIVISION DIVISN2 
. Case 1 1.000 3.000 
& Case 2 1.000 3.000 
Case 3 1.000 3.000 
Eo Case 4 * 1.000 3.000 
Ra Case 0 1.000 3.000 
ie Case 6 1.000 3.000 
Case -7 2.000 1.000 
Case 8 2.000 1.000 
Case 9 2.000 1.000 
Case 10 3.000 2.000 
Case ll 3.000 2.000 
is Case i2 3.000 2.000 
Case 13 3.000 2.000 
Case 14 3.000 2.000 
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6.8 In USDATA, the first 9 cases contain states in region 1. The following 
Conditional program executes the recode from the previous example only for those 
CODE states where REGION equals 1. The FOR ... NEXT statement is ex- 


plained in the next chapter. This example should illustrate its use for 
recodes, however. 


USE USDATA 
SAVE NEWFILE 
LET DIVISN2=DIVISION 
IF REGION=1 THEN FOR 
CODE DIVISN2/l=11, 2=12, 3=2, 11-3, 12=1 
NEXT 
RUN 


Comparing the original and recoded values of DIVISION shows that 
SYSTAT has recoded only those values where REGION=1: 


onginal recoded 
REGION DIVISION DIVISN2 
Case 1 1.000 1.000 3.000 
Case 2 1.000 1.000 3.000 
Case 3 1.000 1.000 3.000 
Case 4 1.000 1.000 3.000 
Case 5 1.000 1.000 3.000 
Case 6 1.000 1.000 3.000 
Case 7 1.000 2.000 1.000 : 
Case 8 1.000 2.000 1.000 
Case 9 1.000 2.000 1.000 
Case 10 2.000 3.000 3.000 
Case 11 2.000 3.000 3.000 
Case 12 2.000 3.000 3.000 
Case 13 2.000 3.000 3.000 
Case 14 2.000 3.000 3.000 
Creating The LABEL command creates a character variable whose values corre- 
character : spond to those of a numeric variable. 
variables using 
LABEL An example: 
LABEL DIVISION/1='New England",2='Mid Atlantic’, 
3='North Central’ 
84 © 1989, SYSTAT, Inc. 
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Transforming variables 


LABEL names the new character variable by adding a dollar sign to the 
numeric variable’s name. In the example above, LABEL DIVISION 
adds the variable DIVISIONS to your file. If DIVISIONS already exists 
in the file. SYSTAT replaces its values with those created by the 
LABEL command. 


LABEL cannot create unique character variable names for subscripted 
variables. If you LABEL a subscripted variable, SYSTAT creates a 
counterpart character variable without the subscript. For example, if you 
use LABEL on the variable QUESTION(3), SYSTAT creates the char- 
acter variable QUESTIONS. Using LABEL with QUESTION(4) also 
produces QUESTIONS. 


You can use these new character variables in place of the numeric vari- 
ables in any SYSTAT statistical procedure that allows value labels, such 
as TABLES. For example, you can use the following statement to make 
value labels in DATA: 


LABEL SEX/1='Male',2="Female' 


and then you can tabulate SEX$ with Tables/Tabulate... in the Stats 


menu. 
Create the file GENDER with the following commands: 


SAVE GENDER 

INPUT SEX 

LABEL SEX/1='MALE', 2="FEMALE’ 
RUN 

1 


85 
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Lagging with 
LAG 
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GENDER will contain two variables: SEX, which you entered by hand, 
and SEX$, which SYSTAT created from SEX: 


_ SEX SEX$ 

Case l 1.000 FEMALE 
Case 2 2.000 MALE 

Case 3 2.000 MALE 

Case 4 1.000 FEMALE 

Case 5 2.000 MALE 
Case 6 2.000 MALE 
Case 7 1.000 FEMALE | 


The LAG function shifts values down one row, replacing the first value 
with a missing value. 


Examples are: 


LET Y=*LAG(X) 
LET Z=LAG(LOG(X)) 


The first example produces a new variable Y whose values are those of 
X, shifted down one position: 


X Y 
1 : 
2 1 
3 ~ 2 
4 3 


The second example produces a new variable Z whose values are the 
logs of X, shifted down one position. 


You cannot LAG a variable twice in one run. For example, neither of 


the following would work. 
LET Y=LAG(LAG(Y)) 


LET Y=LAG(X) 
LET Z=LAG(Y) 
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If you want to lag a variable or expression twice, use successive muns: 


LET Y=LAG(X) 


RUN 
LET. Z=LAGCY) 
RUN 
6.16 Here we input a variable Y and lag its square. 
First order lag 
INPUT Y 
~ SAVE TEST 
2 RUN 
i 
2 
“ALY 3 
4 
5 
6 
6 cases and 1 variables processed 


SYSTAT file created 


USE TEST 

LET Z=LAG(Y%2) 

SAVE TEST2 

RUN m 


st ule ee sll | ies 





6 cases and 2 variables processed 
SYSTAT file created ; ; 


i USE TEST2 
' a RUN 
¥ Z 

Case 1 1.000 : 
Case 2 2.000 1.000 
Case 3 3.000 4.000 
‘Case 4 4.000 9.000 
Case 5 5.000 16.000 
Case 6 6.000 25.000 

ie 

1c. ff ©1989, sySTAT, Inc. 87 
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7 Programming in SYSTAT 


I 


Command reference 90 
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Errors 94 
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Erasing BASIC statements 95 
Statements and expressions 95 
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Missing values 98 
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ELSE 99 
7.1 Using IF... THEN...ELSE to simplify a program 100 
FOR...NEXT 101 
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Programming in SYSTAT / 





ed 


Overview This chapter provides general rules and guidelines for using the 
SYSTAT BASIC data transformation language. You already learned the 
rudiments of BASIC in the previous chapter: LET for simple transfor- 
mations, and IF... THEN for conditional transformations 


You do not need to use SYSTAT BASIC for any but the most compli- 
cated transformations. The previous chapter discusses simple transfor- 
mations, and the Getting Started volume gives instruction on doing 

5 transformations through the SYSTAT Data Editor. 


AUP PR RW 


6 Examples of BASIC programs are given in this chapter and in the 
18 Programming examples chapter. 
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Command reference 


DIM varn) Reserves space for a new variable var 

with subscript n, where n is an integer 
between 1 and 99 inclusive. 

a 

ELSE statement Can follow an IF...THEN command. 
Statement is executed when the IF exprn 
evaluates as false. The statement can be 
any valid command, including DELETE or 
another JF... THEN command. 


ee 


ERASE n7(-n2\ Erases all numbered BASIC statements 
from n/ to n2, inclusive. The default, if 
no range is specified, is all numbered 
statements. 


nr 


FOR [index=n1 TO n2 Starts a FOR...NEXT loop. Index must be 
[STEP=n3]] ... NEXT a numeric variable, either from your file 

or a new variable. You must specify 17, 
but n2 is optional. You may optionally 
specify an increment value with the 
STEP=n3 phrase; the default is +1. You 
may specify any real number or expres- 
sion for n1—3. See text for instructions on 
using FOR...NEXT with or without an 


index. 
ene 
GOTO n Detours the program to the statement 


numbered n. You must have numbered 
line statements in your program to use 
GOTO. 


ae 


IF exprn THEN statement —_ Executes statement if the exprn evaluates 
; as true. Exprn may be any valid expres- 

sion formed with numbers, variables, op- 

erators, and functions. Statement may be 
any valid command, including DELETE. 


© 1989, SYSTAT, Inc. 


90 





im .. —.. — 


— Pe « - "gt hes 


. _— Fen ed Oo, ee ee aa a aR ES ae Ea oh - 
ee ead See oh ek Sic, i ae ale 3 Eg ee ae hee at a 
Lp eS RE oP Rn BA et a A oC 


Command reference Programming in SYSTAT 


LET var=exprn Assigns the value of exprn to the variable 
var. You may use either a numeric or 
character variable. Character values must 
be surrounded by single or double quota- 
tion marks. . 





_ PRINT varlist | ‘string’ Displays the values of the variables listed 
in varlist, or displays the character string 
you specify. Varlist may include numeric 


? or character variables. See Chapter 1 for 
information about using the varlist argu- 
or ment. Character string arguments are dis- 


cussed in this chapter. 








STOP Stops execution of a BASIC program. 
—_— oe 
e 
es 
ip- 
1 
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Introduction to SYSTAT BASIC 


General usage 


92 





SYSTAT BASIC, provided in DATA, has all the transformation power 
of the Math... and Recode... items of the Editor menu plus much 


more. 


In the Data Editor, you must do transformations individually, and you 
are limited to simple “Set variable to exprn” transformations and simple 
conditional “If exprn then set variable to exprn” transformations. The 
previous chapter showed how you can do these simple transformations 
with DATA. 


SYSTAT BASIC lets you do more complicated tasks like the following: 


Execute several transformations at one time 

Use array (subscripted) variables 

Do FOR...NEXT loops 

Do IF... THEN...ELSE statements 

Calculate unusual statistics such as trimmed means. 
Generate unusual random data sets. 


You will typically use the following format for any SYSTAT BASIC 
program: 


USE filename ‘ 
BASIC program 


SAVE newfile 
RUN 


When you enter the RUN command, SYSTAT executes the BASIC 
program you have entered. SYSTAT runs the program once for each 
case in your data file. 
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Saving your work Whenever you transform variables using SYSTAT BASIC, you must 
SAVE the transformed values to a new file to preserve your work per- 
manently. SYSTAT will not add the transformed values to the original 
file. If you do not SAVE to a new file, SYSTAT stores the results of 
your work in a temporary data file. 


For example, the following program writes all the data in USDATA plus 
the variable X into NEWDATA, and NEWDATA becomes the active 
file. USDATA remains unchanged. 


USE USDATA 

LET X=LOG(CARDIO) 
SAVE NEWDATA 

RUN 


When you issue the SAVE command, you must specify a file name dif- 
, ferent from the USE file. DATA does not let you write to the current 
- file. 


3 If you do not use a SAVE command before you type RUN, SYSTAT 
& stores results in a temporary data file. This file becomes the active file, 
es and all your commands refer to it until your next RUN command. If 


you enter more commands and another RUN without giving a SAVE 
statement, DATA overwrites the temporary data file. 


The temporary data files remain in memory if you transfer back to the 

} | main SYSTAT menus to do analyses. SYSTAT continues to use a tem- 
“ porary file until you USE a new file (or use File/Open...), SAVE to a 
file in DATA, open the Data Editor, or QUIT, at which point SYSTAT 
erases all temporary data files. All of this takes place invisibly. 
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Line numbers 


Errors 


Editing a BASIC 
program 
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The temporary data files allow you to do many RUNs without having to 
SAVE intermediate files. You can do all your data and file manipulations 
and then SAVE only the final version. For example: 


USE A(X(1-5)) B 
LET Z=LOG(Z) 
STANDARDIZE X(1-5) 
RUN 

TRANSPOSE 

RUN 

etc. 

SAVE FINAL 

RUN 


You may number the lines in your BASIC programs. Any time you be- 
gin a command with a line number in DATA, SYSTAT assumes you are 
entering a BASIC statement. A line number can be any integer between 
0 and 32,000. You may increment line numbers by any amount. You can 
enter lines out of order, but SYSTAT executes them in increasing nu- 
merical order. 


You do not need to give line numbers to BASIC statements. You may 
even mix numbered and unnumbered BASIC statements in one RUN. If 
you do, SYSTAT executes the unnumbered statements first, in the 
order you enter them. It then executes numbered statements in the 
order of their statement numbers. 


When you enter a SYSTAT BASIC statement, SYSTAT reads it and 
checks for syntax errors, and then stores it in memory for later execu- 
tion. If SYSTAT finds an error, it tells you and lets you enter a new 
statement and continue programming. The statement with an error is 
forgotten. 


Once you enter a BASIC program statement, you can change or erase it 
if you have given the statement a line number. Just enter a new state- 
ment with the same line number. SYSTAT forgets the first statement 
and uses the one you entered instead. 


The following statements: 


© 1989, SYSTAT, Inc. 
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10 LET X=¥ 
20 LET A=X/10 
LQ: LET XZ 


are stored as: 


10 LET X=Z 
20 LET A=X/10 


: Erasing BASIC With the ERASE command you can remove numbered BASIC state- 


statements ments. If, in the above example, you type: 
: ERASE 10 
7 SYSTAT eliminates statement 10 from the SYSTAT BASIC program. 
- This is equivalent to issuing a line number only: 
* 10 
i The advantage of the ERASE command is that you can specify an entire 
me. range of BASIC statements. The following command removes all state- 
= ments with numbers from 10 to 50: 


ERASE 10-50 








= ‘Statements and SYSTAT BASIC makes use of operators and functions in expressions 
@ &xpressions and statements. This section defines statements and expressions. 


# Operators and SYSTAT BASIC uses operators and functions in expressions. The oper- 
% functions ators and functions available are the same ones that are available in the 
ie Data Editor. See the Transforming variables chapter or the Getting 
Started manual for more information on these operators and functions. 


= Statements A statement is a SYSTAT BASIC command followed by its arguments: 


command arguments 


Examples are: 
LET X=Y 
. GOTO 10 
: 
#: ©1989 SYSTAT, Inc. 95 
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IF... THEN has a special format; the IF and THEN clauses each have 
arguments: 


IF exprn THEN statement 


Note that the argument for THEN is a statement. For example, legal 
IF... THEN statements are: 


IF exprn THEN LET var = exprn 
IF exprn THEN GOTO n 

IF exprn THEN PRINT var|string 
IF exprn THEN DELETE 

IF exprn THEN IF exprn THEN... 
IF exprn THEN FOR... 


Note that IF... THEN can take another IF... THEN statement as its 
THEN statement. 


Expressions An expression is a combination of one or more variables (including spe- 
cial built-in variables), numbers, character strings, and/or operators 
which evaluates to some numeric or character value. There are three 
types of expressions: numeric, character, and relational. 


Numeric expressions 
Numeric expressions contain only numbers, variables, built-in variables 
(e.g. CASE), functions, gperators, or combinations of these and evaluate 
to any real number which has a legal value in SYSTAT. 


Examples of numeric expressions are: 


2 

2+2 

SOR(2) 

CARDIO 
CARDIO+CANCER 
SQR( CANCER) 


Character expressions 
Character expressions contain only character strings or character vari- 
ables. When you use a character string or value as part of an expression, 
you must enclose that value in quotes. A character string cannot be 
longer than 12 characters. 
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Examples are: 


"MALE' 
av Tt Ss" 


Relational expressions 
The third type of expression is relational, or comparative. It compares 
either two numeric or two character expressions. It consists of two ex- 
pressions of the same type joined by a relational operator (<, >, =, <=, >=, 
or <>). You cannot compare a numeric expression to a character expres- 
sion. 


Examples of relational expressions are: 


REGION = 1 

STATE$ <> ‘NY' 

CARDIO > CANCER 

400 < (CARDIO+CANCER) 


You may join a number of relational expressions together with the logi- 
cal operators AND or OR to form complex relational expressions, e.g.: 


CARDIO>100 OR CANCER>300 
AGE<17 AND SEX$=° FEMALE’ 
(AGE<17 OR AGE>60) AND SEX$—="MALE' 


You can also negate relational expressions with the logical operator 
NOT. NOT changes the value of a nonzero (true) expression to zero 
(false) and the value of a zero expression to one. For example: 


NOT (AGE>30 OR EXPERNCE>10 OR SCORE>80) 


SYSTAT evaluates a relational expression for each case in your file. If 
the expression is true, SYSTAT assigns it a value of one for that case. If 
the expression is not true (is false), SYSTAT assigns it a value of zero 
for that case. (SYSTAT follows the standard for programming !an- 
guages and returns a zero value for false and one for true. Microsoft 
BASIC is nonstandard and returns a value of zero for false and minus 
one for true.) 





my, - 





. © 1989, SYSTAT, Inc. m 


i. 
a : 






Inc. 








Programming in SYSTAT 


Missing values 


98 


Introduction to SYSTAT BASIC 


For example, the following command gives X a value of |! for every case 
where REGION is greater than 3 and a value of 0 for cases where 
REGION is 3 or less. 


LET X=REGION>3 


You can place a relational expression in the IF clause of an IF... THEN 
statement. If the expression evaluates to 1 (the expression is true), 
SYSTAT executes the statement following THEN. If the expression 
evaluates to 0 (the expression is not true), SYSTAT does not execute the 
statement . 


Some valid IF... THEN statements: 


IF SEX$='FEMALE' THEN LET GROUP=1 

IF CARDIO>100 OR CANCER>300 THEN LET RATE$='HIGH' 
IF GROUP=1 THEN GOTO 10 

IF EDUCATN<12 THEN DELETE 

IF REGION>3 AND CARDIO<300 THEN FOR 

IF REGION>3 AND IF CARDIO<300 THEN FOR 


Missing character data values appear in SYSTAT commands and output 
as blanks. Missing numeric data values appear as periods (.). SYSTAT 
stores missing numeric data as negative values less than any number al- 
lowed in SYSTAT arithmetic, -1.0E36. This is because all logical com- 
parisons, including < and >, must evaluate to TRUE (1) or FALSE (0). 
Otherwise, statements such as GOTO would not work properly. 


Logical comparisons with missing numeric data evaluate.as follows: 


X<. is always false 
X=. __ is true if X is missing and false otherwise 
X>.__ is false if X is missing and true otherwise 


X <>. is false if X is missing and true otherwise 

All numeric arithmetic expressions involving missing values propagate 
missing values. For example, if you have two variables X and Y ina 
SYSTAT file and you sum them as follows: 


LET SUM = X + Y 
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the value for the new variable SUM is missing for every case where ei- 
ther X or Y or both are missing. 


Y X SUM 
1 1 2 
2 : 


3 


Only the multi-variable functions AVG, SUM, MIN, MAX, and STD 
automatically exclude missing values from computations. See the 
Transforming variables chapter for more information about these 
functions. 


With the IF... THEN statement, you can execute statements condi- 
tionally. The syntax for an IF... THEN statement is: 


IF condition THEN statement 


The statement that follows THEN can be any legal SYSTAT BASIC 
statement including LET, FOR...NEXT, DELETE, GOTO, and an- 
other IF... THEN. Examples of IF... THEN statements are: 


IF X=99 THEN LET X=. 

IF X<20 THEN DELETE 

IF CARDIO>400 THEN GOTO 100 

IF CARDIO>400 AND CANCER>100 THEN LET RATE$='EXTREME’ 
IF 2+2 THEN STOP 


The statement following IF...THEN... is executed only if the condition 
following the IF is nonzero (not FALSE). Notice that the condition 
following the IF in the last example (2+2) is nonzero, so the whole 
statement is equivalent to a STOP statement alone. 


You may execute more than one conditional transformation per RUN. 
If you are testing consecutive IF... THEN conditions on the same vari- 
able or variables, you should use IF... THEN...ELSE, discussed below. 


The examples above tested cases for one condition (e.g., CARDIO>400, 
or CARDIO>400 AND CANCER>100). If the case met the condition, 
SYSTAT executed the transformation. If the case did not meet the 
condition, SYSTAT did not execute the transformation. 


99 
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You may want, however, to execute many conditional transformations at 
once. If you are testing consecutive related conditions on the same vari- 
able, SYSTAT provides an ELSE statement to accompany IF... THEN. 


In its simplest form, IF... THEN and ELSE take the format: 


IF expression THEN statement 
ELSE statement 


SYSTAT executes the statement following ELSE only when the preced- 
ing IF condition evaluates to false. Another IF.. .THEN statement can ) 
follow ELSE, enabling you to string together a number of related | 
conditional transformations: 


IF expression THEN LET varmexpression 

ELSE IF expression THEN LET var=expression 
ELSE IF expression THEN LET var=expression 
ELSE LET var=expression 


eth ee. ol a el 


In this case, SYSTAT executes the statement following ELSE only 
when all preceding IF conditions are false. When a preceding condition 
is true, SYSTAT ignores subsequent ELSE statements. 


the 


a Here we compare two transformation programs. The first uses only ; 
Using IF... THEN statements to assign values to a new variable called RATES ; 
IF...TH EN...ELSE based on values for CARDIO: 
to simplify a j | 
program USE USDATA 


SAVE NEWDATA 
IF CARDIO<400 THEN LET RATE$='LOW’ 
IF CARDIO>=400 AND CARDIO<465, 
THEN LET RATE$='AVERAGE' 
IF CARDIO>=465 THEN LET RATE$="HIGH’ 


RUN 
Using IF... THEN and ELSE makes the program simpler and more 
efficient: 
100 © 1989, SYSTAT, Inc. 
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SAVE NEWDATA 
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IF CARDIO<400 THEN LET RATE$='LOW’ 
ELSE IF CARDIO<465 THEN LET RATE$='AVERAGE’ 


ELSE LET RATE$='HIGH' 


RUN 


SYSTAT executes this program once for each case. The order of the IF 
and ELSE statements is important. The ELSE depends on the truth of 
the IF conditions before it. SYSTAT executes an ELSE statement only 


if all preceding conditions are false. 


After running this program, check the values for the first ten cases for 


CARDIO and RATES: 


USE NEWDATA 


REPEAT 10 


LIST CARDIO, RATE$ 


RUN 


Case 
Case 
Case 
Case 
Case 
Case 
Case 
Case 
Case 
Case 


— 


Statement 
Statement 


NEXT 
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CARDIO 


466. 
395. 
433. 
460. 
474, 
423. 
499. 
464. 
508. 
443. 


200 
900 
100 
600 
100 
800 
500 
700 
700 
100 


RATES 


HIGH 
LOW 
AVERAGE 
AVERAGE 
HIGH 
AVERAGE 
HIGH 
AVERAGE 
HIGH 
AVERAGE 


The syntax for a FOR...NEXT statement is: 


FOR [index=ni T0 n2 [STEP=n3]] 
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Programming in SYSTAT Introduction to SYSTAT BASIC 


Here are some examples: 


FOR 
LEY AS] 
Ch 3 Pi gia 
NEXT 


FOR I=1 TO 10 
PRINT I 
NEXT 


FOR W=O TO CARDIO<400 STEP URN 
PRINT W 
NEXT 


FOR...NEXT loops are executed for each case (or number of times 
specified in a REPEAT statement). In the first example above, the 
FOR...NEXT is superfluous, since the two LET statements are exe- 
cuted only once for each case anyway. In the second example, the 
PRINT statement is executed 10 times for each case. 


The third example is bizarre, but illustrates some important points. First 
of all, notice that the indices (m1,n2,n3) can be expressions. If CARDIO 
is greater than or equal to 400 for a case, then W is printed only once 
(as zero), because the index W runs from zero to zero. Otherwise, W 
runs from zero to one in increments determined by a uniform random 
number chosen once before the loop is executed. This means that the 
PRINT statement will be executed a random number of times for each 
case where CARDIO is less than 400. This type of construct can be 
useful in Monte Carlo simulation. 


Control FOR...NEXT loops without an index are executed once. Otherwise, 
FOR...NEXT loops are tested at the beginning to determine whether 
they should be executed for the current value of index. See Example 7.6 
for an explicit parsing of a FOR...NEXT loop using GOTO statements. 
Some other languages may execute FOR...NEXT loops once even 
when a condition is false. The following example will not print 


anything: 


FOR I=6 TO 3 STEP 1 
PRINT I 
. NEXT 
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Nesting You can nest up to ten FOR...NEXT loops in this version of SYSTAT. 
In versions before 4.0, nesting was not allowed. 
; FOR 
FOR 
FOR 
NEXT 
NEXT 
NEXT 


You must always match every FOR with a NEXT. 


= 7.2 The mean value for the variable PULMONAR from the data set 
.. _ Conditional USDATA is approximately 26.4 with a standard deviation of 5.6. 
FOR...NEXT Suppose you want to set RATES to “HIGH” and RATE to 1 every- 
where that PULMONAR is more than one standard deviation above 
average. 
USE USDATA 


SAVE NEWDATA 
IF PULMONAR>32 THEN FOR 
LET RATE$='HIGH' 
LET RATE=1 
NEXT 
LIST PULMONAR, RATE$+, RATE 
RUN 


Here are the first 10 cases output: 





5 PULMONAR RATES RATE 
a Case 1 33.600 HIGH 1.000 
Case 2 29.600 F 
ae Case 3 33.100 HIGH 1.000 
ja Case 4 24.900 3 
_ Case 5 27.400 
Case 6 23.200 
t Case 7 23.900 
“ Case 8 23.300 
Case 9 27.000 
Case 10 27.400 
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For the cases that meet the condition PULMONAR>32, SYSTAT exe- 
cutes the two transformations between FOR and NEXT. For those 


cases that do not meet the condition, SYSTAT assigns missing values to 
RATES and RATE. 


7.3 This example shows the IF... THEN...ELSE format with the 
FOR...NEXT with FOR...NEXT statement. Suppose you want to make the following 
ELSE assignments: 


Where PULMONAR is LET RATE$ = Let RATE = 
<20.8 LOW 1 
>=20.8 and <32.0 MID 
>=32.0 HIGH 3 
The following program does this: 
USE USDATA 


SAVE NEWDATA 
IF PULMONAR<20.8 THEN FOR 
LET RATE$='"LOW' 
LET RATE=1 
NEXT 
ELSE IF PULMONAR<32.0 THEN FOR 
LET RATE$='MID' 
LET RATE=2 
NEXT “ 
ELSE FOR 
LET RATE$="HIGH' 
LET RATE=3 
NEXT 
LIST PULMONAR, RATE$, RATE 
RUN 


The first 10 cases output are: 
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Introduction to SYSTAT BASIC Programming in SYSTAT 
PULMONAR RATES RATE 
Case 1 33.600 HIGH 3.000 
Case 2 29.600 MID 2.000 
Case 3 33.100 HIGH 3.000 
Case 4 24.900 MID 2.000 
Case 5 27.400 MID 2.000 
Case 6 23.200 MIO 2.000 
Case 7 23.900 MID 2.000 
Case 8 23.300 MIO 2.000 
Case 9 27.000 MID 2.000 
Case 10 27.400 MID 2.000 


If, for a case, PULMONAR is less than 20.8, SYSTAT executes the as- 
sociated FOR...NEXT statements, setting the values of RATES to 
LOW and RATE to 1. It does not execute the subsequent ELSE state- 
ments but moves on to the next case. 


If PULMONAR is greater than or equal to 20.8 but less than 32.0, 
SYSTAT executes the first ELSE statement. SYSTAT sets RATES to 
MID and RATE to 2 and does not execute the second ELSE statement. 


i “ot Mg ts ‘ 
3, Bis me ee, 


If PULMONAR is greater than 32.0, SYSTAT executes the last ELSE 
statement and sets RATES to HIGH and RATE to 3. 


a FOR...NEXT You can use FOR...NEXT to define program loops that assign incre- 


= loops with mental values to an index variable. You can use such a loop to transform 
= subscripted a set of subscripted variables. SYSTAT executes the statements between 
il variables the FOR and the NEXT statements for each successive value of the in- 

Z : : dex variable you specify. The index variable begins with the inital value 


you assign, does the transformations, and then increases by one. ‘The 
cycle repeats until the index variable reaches the limit specified with 


TO. 

Examples: 

FOR I=1 TO 5 
4 FOR TRIAL=1 TO LAST 
-_ FOR J=2 TO 20 STEP 2 


The STEP option adjusts the size of the increment. If you enter the fol- 
lowing, SYSTAT increments N by two each time. Its values are there- 
fore 1, 3, 5, 7, and 9 consecutively. 


FOR Nwi TO 10 STEP 2 
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Temporary 
subscripts using 
the ARRAY 
statement 


7.5 
Logging ten 
variables 


106 


With this specification, SYSTAT runs through the loop only six times. 


Note: if you want to execute a set of commands on a certain number of 
cases, use the REPEAT command (see REPEAT this chapter) rather 
than the FOR...NEXT construct. Remember, every program is exe- 
cuted once for each case. Therefore, if you use FOR... NEXT, DATA 
runs through the loop for every case. FOR I=1 TO 5 , 


If your variable names are not already subscripted, you can use the 
ARRAY statement before your BASIC program to assign subscripts 
temporarily for the purpose of doing transformations inside a 
FOR...NEXT loop. See the Programming examples chapter for more 


information. 


Suppose you have a file containing the variables X(1-10) and you want 
to calculate the natural log of each. You could either enter ten separate 
LET commands or use the FOR...NEXT looping construct to do this, 


e.g.: 


FOR N=1 TO 10 
LET X(N) = LOG(X(N)) 
NEXT 


SYSTAT runs through the loop ten times, increasing the value of N by 
one each time. Thus, N successively has the values 1, 2, 3, 4, 5, 6, 7, 8, 9, 
and 10. Therefore, this program is the same as: 

LET X(L)*LOG(X(1)) 

LET X(2)=LOG(X(2)) 

LET X(3)=LOG(X(3}) 

LET X(4)=LOG(X(4)) 

LET X(5)=LOG(X(5)) 

LET X(6)=LOG(X(6)) 

LET X(7)=LOG(X(7)) 

LET X(8)=LOG(X(8)) 

LET X(9)=LOG(X(9)) 

LET X(10)=LOG(X(10)) 


If you don’t want to clobber the values in X by replacing them with their 


logs, you can assign the transformed values to a new variable Y. See the 
DIM statement below for how to create a new subscripted variable. 
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Introduction to SYSTAT BASIC : Programming in SYSTAT 


The STEP option adjusts the size of the increment. The following in- 
creases N by two each ume to values 1, 3, 5, 7, and 9 successively. 
SYSTAT runs through the loop only five times. 


FOR N=l TO 10 STEP 2 
Therefore, this program is the same as: 


LET X(1)=LOG(X(1)) 
LET X(3)=LOG(X(3)) 
LET X(5)=LOG(X(5)) 
LET X(7)=LOG(X(7)) 
LET X(9)=LOG(X(9)) 


DIM To add new subscripted variables to a file, you must use the DIM state- 
ment first. DIM reserves space for new subscripted variables. 






















For example, the following DIM statement creates new variables X(1), - 
X(2), X(3), X(4), and X(5). 


DIM X(5) 


You can use subscripted variables defined with DIM in transformations. 
Suppose you have variables X(1-10). The following program creates 
new variables Y(1—-10) whose values are the natural logarithms of corre- 
sponding values in X(1—10). 


DIM Y(10) 
FOR N=1 TO 10 

LET YCN) = LOG(X(N)) 
NEXT 
Without the DIM statement, SYSTAT would not understand the Y 
variable subscript in the LET statement and would respond with an er- 
ror message. 


You cannot redimension existing or previously defined arrays of sub- 

scripted variables. You can add new variables to an existing array by 

entering them in the Data Editor, but you cannot do it in DATA. 
GOTO A GOTO statement jumps from the current statement to the numbered 


statement specified. It works only with numbered statements. 
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GOTO 


7.6 
Simple GOTO 


PRINT 


108 


A GOTO statement jumps from the current statement to the numbered 
statement specified. It works only with numbered statements. 


For example, GOTO 10 makes DATA jump to statement 10. You can 
combine GOTO with IF... THEN for programming flexibility, e.g.: 


IF CASE=10 THEN GOTO 50 
IF GROUP>3 THEN GOTO 100 


Here is a simple SYSTAT BASIC program using GOTO: 


10 LET I=J-L 

20 LET J=I+t 

30 IF I>K THEN GOTO 60 
40 PRINT I 

50 GOTO 20 

60 STOP 


It is equivalent to the following program that uses the FOR...NEXT 
construct; 


10 FOR I=J TO K STEP L 
40 PRINT I 

50 NEXT 

60 STOP 


You can use IF... THEN with GOTO to program loops like those pro- 
vided by other languages such as Pascal or FORTRAN. For example, 
you could program REPEAT...UNTIL, WHILE, or other flow-of- 
control constructions that SYSTAT BASIC does not directly provide. 


The PRINT command prints the values of variables you specify. You 
can also use PRINT to print character strings, which is often useful for 
BASIC programs. 

Suppose you have a program that sums the values of a variable. Also 
suppose that instead of recording the answer (sum) somewhere in the 
worksheet, you just want the program to display its results. You can do 
this by issuing a PRINT statement: 


PRINT “The sum of values in A is ",SUM 
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Such a program is presented as Example 10.16. Note how a PRINT 
statement lists the text literally (enclosed in quotation marks) and then 
lists the variable WINESUM after a comma. 


7 » -% USE USDATA 


HOLD 
IF WINE<>. THEN LET WINESUM=WINESUM+WINE 


: IF EOF THEN PRINT “Sum of Wine =",WINESUM 
i RUN 


As discussed in the Entering data chapter, DATA prints numeric and 
character values in 12 column, right-justified fields. Blanks pad the left 
of each field. Character strings that you specify literally (1.¢., not those 
that are values of character variables you listed in the PRINT command) 
are not justified; they are printed exactly the way you specified them, but 
without the surrounding quotation marks. 


ae ae 


STOP The STOP command halts work on the current observation and clears 
memory for work on the next observation. You will rarely need to use 
STOP. A possible case where you might want STOP is to terminate a 
loop when a certain value is reached. 


Using REPEAT _‘ For any BASIC program, you can use the REPEAT command to limit 
the action of the program to a certain number of cases—just as you can 
use REPEAT 10 to limit the action of commands like LIST to the first 


ten cases, for instance. 


— 
4 An example using STOP is Example 10.6. 
es 













Thus, you can use REPEAT to test complex BASIC programs. If you 
use REPEAT before RUN, you can see whether the program is correct 
or if you need to change it before running iton an entre file. or 
example: 


REPEAT 3 

USE MYFILE 
SAVE TESTFILE 
LET X=LOG(X) 
LET Y=LOGC(Y) 
SORT 

RUN 
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If you made a mistake writing the program, you would find that out be- 
fore wasting your time running the program on the entire file. For files 
with several hundred cases, a brief trial run can pay off. After you are 
sure everything is OK, type REPEAT with no arguments to restore the 
counter. 


Computation SYSTAT BASIC was designed specifically for statistical and scientific 

Numerical computation. All arithmetic is done in double precision using algorithms 

accuracy chosen for their accuracy. Therefore SYSTAT BASIC is usually more 
accurate than other implementations of the BASIC programming lan- 
guage. For example, if you type the following commands: 


REPEAT 1 

LET X = INT(2.6%*7-0.2) 
PRINT X 

RUN 


SYSTAT prints the correct value, 18.000, and reports that one case and 
one variable were processed. This is the correct answer. Some BASICs 
return an incorrect value of 17. You might want to try this in your own 
computer’s version of BASIC. If it returns an incorrect value, do not 
trust any programs written in that language. 


Memory You can run out of memory if you write too long a program. If you get 


limitations an out of memory error message, reduce the number of transformations 
that you execute in a single run. 


110 © 1989, SYSTAT, Inc. 





aye 


i dies Cn gig SY 6? atk ett a FO te ete ee >. as . : 

' 2 ee er a ht oe - et rae aa La ee ee ae tn! ; na a . i z 

"7 me ye Tete ey pare te th hart pa SESens taney othe eee ae ; OS cate of #3 
an . ws Fr 7 = deley a Meee 


introduction to SYSTAT BASIC Programming in SYSTAT 





* 
2 
‘ yee Bs ig «ey cn cat a 
‘ See oe wars ote AWE le Oe 
~ : ea, «~ ¥8 _ - 
: ¥. We ee ~ ‘ - 
’ Z : er an See N a 
me ee bye cw i es Sy Etta oe 7 
- 7 - + a 7 =e 7 





3 Sorting, ranking, and standardizing 
ee 


Command reference 114 
General strategy 115 
Sorting 115 
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8.2 Nested sort 118 

8.3 Computing medians 119 

8.4 Computing quantiles 120 
Ranking 120 
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8.6 Ranking large files 122 

8.7 Winsorized and trimmed means t22 

8.8 Normalized scores 123 
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Sorting, ranking, and standardizing 3 


Overview This chapter shows you how to sort, rank, and standardize data in 
SYSTAT. 


You can also sort, rank, and standardize using the regular Data Editor 
with Sort..., Rank..., and Standardize... from the Data menu; see the 
Getting Started manual for instructions. 


pees. 


DATA does offer some variations not available with the Data Editor; 
see the examples. 








© 1989, SYSTAT, Inc. 113 





etn oe se owt eS Rw e. : 
i ke wen’ Vy ts ce = to aPaes mee ete” img 
aah ate + Derg ta sy Ge ee asa 
—_—- - = —" — = bei is rea 








Command reference 





RANK varlist Transforms all numeric variables in varlist 
to ranks. Each variable is ranked within 
its own distribution. The default is all 
numeric variables in the file. 





SORT varlist Sorts the datafile on the variables speci- 
fied in varlist. Varlist can include numeric 
or character variables or both. The de- 
fault is all variables in the file, in the 
order that they appear in the file. 





STANDARDIZE varlist Standardizes the numeric variables 
named in varlist. The default is all nu- 
meric variables in the file. 
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General strategy 


Sorting 


© 1989, SYSTAT, Inc. 








The SORT command sorts a file in ascending order by up to ten 
numeric and/or character variables. 


Examples are: 


SORT 

SORT AGE 

SORT NAME$ 
SORT SEX$, AGE 


If you do not specify any variables in the SORT command, as in the first 
example, SYSTAT sorts using the first variable in the file. The last ex- 
ample shows a nested sort. SYSTAT would sort this file first by SEXS 
and then by AGE within SEXS. 


Sorting orders cases in increasing numerical order. Missing values come 
first. SYSTAT sorts character data in ascending ASCII order, with 
blanks (missing values) at the beginning. The sequence for an ascending 
character sort is: 


"#9$% &’()*+,-. 
0123456789 
“ ::<¢=>7@ 
ABCDEFGHIJKLM 
NOPQRSTUVWXYZ 
[\]*° 
abcdefghijkim 
nopqrstuvwxyz 
{1} 


This means that words are sorted alphabetically with upper case words 
preceding lower case. (Note that if you sort a character variable contain- 
ing numeric values, those values are sorted from left to right, rather than 
small to big: 1, 12, 150, 2, 31, 4000, 5.4, etc.) 


If you want to retain the data in their sorted order, you must save them 
to a new file. Sorting, by itself, does not create a sorted permanent file. 
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Sorting, ranking, and standardizing General strategy 


8.1 
Simple sort 
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Enter the following data to use in the examples: 


SAVE TEMP 
INPUT SEX$, AGE 
LET N=Case 
RUN 

FEMALE 5 
MALE 6 
MALE 4 
FEMALE 6 
FEMALE 5 
MALE 6 
MALE 8 
FEMALE 3 
MALE 6 
FEMALE 5 
MALE 4 
MALE 5 
FEMALE 5 
FEMALE 6 


~ 


The variable N stores the original case number, so each case includes a 
value for SEX$, AGE, and its index. 


The following program sorts the file TEMP on the variable SEX$. 
USE TEMP , 

SAVE: SORT1 

SORT. SEX$ 

RUN 


SYSTAT reports on its progress. 


Begin sort 

14 cases sorted 
Saving sorted file 
End sort 


Now list the file. 
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USE SORT1 

LIST 

RUN 

SEX$ AGE N 

Case 1 FEMALE 5.000 1.000 
Case 2 FEMALE 6.000 4.000 
Case 3 FEMALE 5.000 5.000 
Case 4 FEMALE 3.000 8.000 
Case 5 FEMALE 5.000 10.000 
Case 6 FEMALE 5.000 13.000 
Case 7 FEMALE 6.000 14.000 
Case 8 MALE 6.000 2.000 
Case 9 MALE 4.000 3.000 
Case 10 MALE 6.000 6.000 
Case 11 MALE 8.000 7.000 
Case 12 MALE 6.000 9.000 
Case 13 MALE 4.000 11.000 
Case 14 MALE 5.000 12.000 


: SYSTAT has rearranged the file so that the cases where SEX$ equals 
2 “FEMALE?” come before those where SEX$ equals “MALE.” 
(Remember that N represents the original position of each case in the 


file.) 


This program sorts the file by AGE: 
USE TEMP 

SAVE SORT2 . 

SORT AGE 

RUN 
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Here are the cases in the new file SORT?2. 


SEX$ AGE N 
Case 1 FEMALE 3.000 8.000 
Case 2 MALE 4.000 3.000 
Case 3 MALE 4.000 11.000 
Case 4 FEMALE 5.000 1.000 
Case 5 FEMALE 5.000 5.000 
Case 6 FEMALE 5.000 10.000 
Case 7 MALE 5.000 12.000 
Case 8 FEMALE 5.000 13.000 
Case 9 MALE 6.000 2.000 
Case 10 FEMALE 6.000 4.000 
Case ll MALE 6.000 6.000 
Case 12 MALE 6.000 9.000 
Case i3 FEMALE 6.000 14.000 
Case 14 MALE 8.000 7.000 


In this example, SYSTAT has rearranged the cases so that the values of 
AGE go from smallest to largest down the file. 


8.2 This example illustrates a nested sort. The program first sorts on the 
Nested sort variable SEX$. Then, within SEXS, it sorts the cases based on AGE. 


USE TEMP : 
SAVE SORT3 
SORT SEX$, AGE™ 
RUN 


A LIST of the file SORT3 produces: 


a 


SEX$ AGE N 
Case is FEMALE 3.000 8.000 
Case 2 FEMALE 5.000 1.000 
Case 3 FEMALE 5.000 5.000 
Case 4 FEMALE 5.000 10.000 
Case 5 FEMALE 5.000 13.000 
Case 6 FEMALE 6.000 4.000 
Case 7 FEMALE 6.000 14.000 
Case 8 MALE 4.000 3.000 
Case 9 MALE 4.000 11.000 
Case 10 MALE 5.000 12.000 
Case 11 MALE 6.000 2.000 
Case 12 MALE 6.000 6.000 
Case 13 MALE 6.000 9.000 
Case 14 MALE 8.000 7.000 
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Notice that again, SYSTAT arranges the file so that cases where SEX$ 
equals “FEMALE” come before those where SEX$ equals “MALE.” 
Within each of these groups, SYSTAT now arranges the cases so that 
the values for AGE go from smallest to largest. This is a nested sort. 


8.3 You can get medians with Stem in the Graph menu. Here is a DATA 
Computing program for computing medians. 
medians 


The program has three steps. First, it sorts the file on the variable for 
which you want to find the median. Next, the program creates the vari- 
ables N, N1, and N2. N is the total number of cases. If the number of 
cases is odd, N1 and N2 are both the case number of the middle case. If 
the number of cases is even, N1 and N2 are the case numbers of the two 
middle cases. The HOLD command keeps the final values of these vari- 
ables in memory for use in the third step. Finally, the program computes 
the median value. 


Here we find the median of CARDIO: 


. 
i USE USDATA 
. SAVE SORTDATA 
4. SORT CARDIO 
5 RUN 
= USE SORTDATA 
+ HOLD 
4 LET N=CASE 
LET NI=INT(N/2)+1 
LET N2=N-N1+1 
RUN 


IF CASE=N1 THEN LET MEDIAN=MEDIAN+CARDIO/2 

IF CASE=N2 THEN LET MEDIAN=MEDIAN+CARDIO/2 

IF EQF THEN PRINT "The median of CARDIO is",MEDIAN 
RUN 





The median value of CARDIO is 832.400 


50 Cases and 6 variables processed. 
No SYSTAT file created. 
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Sorting, ranking, and standardizing General strategy 


You can find the median of a variable in a small to medium-sized file 
with Stem or Box in the Graph menu. This DATA program is not 
limited by internal memory, so it can find the median for any number of 
cases. 


8.4 | The following program prints the quantiles of a variable: 
Computing 
quantiles USE RAWDATA 

SAVE SORTDATA 

SORT X 

RUN 

USE SORTDATA 

HOLD 

LET N=CASE 

RUN 

LET Q=CASE/(N+1) 

LIST X,Q 

RUN 


Ranking The RANK command ranks variables. 
Examples are: 


RANK 
RANK RAINFALL 
RANK JUDGEMENT, SCORE 


The first example ranks all the numeric variables in the file. When 
SYSTAT ranks a variable, it replaces the original values with their 
ranks. If two or more values are the same, SYSTAT averages their 
ranks. 





8.5 This example uses the data file TEMP created in the previous section. 
Simple ranks The example creates a variable AGERANK which contains the rank 
values of AGE. 

















USE TEMP ; 
SAVE RANKDATA i 
LET AGERANK=AGE : 
RANK AGERANK 
RUN 
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14 Cases and 3 variables processed. 
SYSTAT file created. 
Please wait while data are processed and resaved. 
Begin rank. 
End rank. 


Now list AGE and AGERANK. 


USE RANKDATA 
LIST AGE, AGERANK 











a RUN 
e AGE AGERANK 
Case A 5.000 6.000 
= Case 2 6.000 11.000 
= Case 3 4.000 2.500 
Case 4 6.000 11.000 
ig Case 5 5.000 6.000 
be Case 6 6.000 11.000 
ry Case 7 8.000 14.000 
4 Case 8 3.000 1.000 
aa Case 9 6.000 11.000 
Case 10 5.000 6.000 
& Case 11 4,000 2.500 
be Case Le 5.000 6.000 
* Case 13 5.000 6.000 
or Case 14 6.000 11.000 
oad 
= This shows that case 8 has the lowest value of AGE, cases 3 and 11 share 
= the next lowest values, and so on up to case 7 which has the greatest 
ca value. 


Here we show AGE and AGERANK from the sorted file. Note that 
ranks are simply the case numbers of the sorted file, with ties averaged. 





AGE AGERANK 
Case 1 3.000 1.000 
Case 2 - 4.000 2.500 
Case 3 4.000 2.500 
Case 4 §.000 6.000 
Case 5 §.000 6.000 
Case 6 5.000 6.000 
Case 7 5.000 6.000 
Case 8 5.000 6.000 
Case 9 6.000 11.000 
Case 10 6.000 11.000 
Case 11 6.000 11.000 
Case 12 6.000 11.000 
Case 13 6.000 11.000 
Case 14 8.000 14.000 
© 1989, SYSTAT, Inc. 121 
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Sorting, ranking, and standardizing General strategy 


8.6 
Ranking large 
files 


8.7 
Winsorized and 
trimmed means 
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The RANK command does its work in memory. Therefore, you can min 
out of room with files containing thousands of cases. If this happens, 
first SORT the file by the variable you wish to rank. Then, replace the 
values of the rank variable with the case number. The following 
program again ranks the variable AGE: 


USE TEMP 

LET AGERANK=AGE 
SORT AGERANK 

RUN 

LET AGERANK=CASE 
SAVE RANKDAT2 
LIST AGE,AGERANK 
RUN 


Note that in this example the data do not remain in the original order. 
Preserving that order requires a more complicated program: 


USE TEMP 
LET ORIGINAL=CASE 
LET AGERANK=AGE 
SORT AGERANK 

RUN 

LET AGERANK=CASE 
SORT ORIGINAL 
SAVE RANKDAT3 
DROP ORIGINAL 

RUN 


Neither program is capable of averaging ted ranks. (That task is left as 
an exercise for the reader.) 


To “ten percent trim” a variable, rank the variable and then DELETE 
the upper and lower ten percent of the data values. You can then com- 
pute a ten percent trimmed mean of the variable with Stats from the 
Stats menu This example computes a ten percent trim for the variable X 
from a data file called RAWDATA: 


The first part of the program makes a copy of X that we then use for 
ranking. 
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General strategy _ Sorting, ranking, and standardizing 


USE RAWDATA 


LET RX=X 
RANK RX 
RUN 


The next part finds the number of cases in the file. We use HOLD to 
keep the last value of N for the final part of the program. 


- HOLD 

& LET N=CASE 

= RUN 

: The final part deletes the extreme observations. 
a 

= SAVE RAWDATA 

‘bits IF RX<N/10 OR RX>9*N/10 THEN DELETE 


ap RUN 


Programs similar to the trimmed means program can Winsorize means, 
‘i biweight cases, and weight cases with various schemes through use of 
i: the RANK function. See Barnett and Lewis (1978), Launer and 
Wilkinson (1979), and Huber (1977) for more information on these 
procedures. You can also use fractional ranks in formulas to compute 
trimmed means, if you wish. 


ge 88 A normalized score is the standard normal deviate corresponding to the 
% Normalized sample quantile of an observed value. It can be thought of in two ways: 
SECIS as the z-score the value would have if the observed distribution were 


perfectly normal, or as the distance of the value from the mean in stan- 
dard units. In any case, do not confuse normalized scores, which neces- 
sarily have a perfect normal distribution, with z scores, which do not. 
(See “Standardizing” below for how to produce z scores.) 


= Converting scores to normal scores reshapes the observed distribution 
P3 into a normal distribution. In practice, the effectiveness of this proce- 
a dure is limited by the number of distinct values and their distribution in 
. ¥ the original sample. There are limits to how normal one can make bi- 
nary data! Some nonparametric tests are equivalent to performing para- 
metric tests (t-tests, ANOVA, etc.) on normalized values of dependent 
variables. 


ae 
“hte 
Ag 
bd 
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Sorting, ranking, and standardizing General strategy 
The following is like the program we used to produce trimmed means. 
It saves normalized scores for X (XNORM) back into the file. 
The first part makes a copy of X that we then use for ranking. 


USE RAWDATA 


LET RX=X 
RANK RX 
RUN 


The next part finds the number of cases in the file. We use HOLD 
because we need to use the final value of N in the next step. 


HOLD 
LET N=CASE 
RUN 


The final part takes the inverse normal density function of the sample 
quantiles. See the Transforming variables chapter for information 
about ZIF (Z inverse function). 


SAVE RAWDATA 

LET XNORM = ZIF(RX/(N+1)) 
DROP RX,N 

RUN 


Standardizing You can standardize one or more variables in DATA with the 
STANDARDIZE command. STANDARDIZE replaces values of vari~ 
ables with their sample standard scores, or z-scores. 


Examples are: 


STANDARDIZE 
STANDARDIZE QUESTION(1-5) 


8.9 Standardizing This example uses the data set TEMP we created before Example 8.1 at 


ages the beginning of this chapter. The example creates and lists a variable 
AGESTAND that contains the standardized values of AGE. 
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General strategy Sorting, ranking, and standardizing 


USE TEMP 

SAVE STANDATA 

LET AGESTAND=AGE 

STANDARDIZE AGESTAND 
= RUN 

USE STANDATA 

LIST AGE, AGESTAND 

RUN 


14 Cases and 4 variables processed. 
SYSTAT file created. 
'e Please wait while data are processed and resaved. 
a, Begin standardize 
End standardize 


AGE AGESTAND 


fa Case 1 5.000 -0.237 

Case 2 6.000 0.593 

- Case 3 4.000 -1.068 

of Case 4 6.000 0.593 
ae Case 5 5.000 -0.237 
ee, Case 6 6.000 0.593 
eal Case 7 8.000 2.254 
Case 8 3.000 -1.898 

Case 9 6.000 0.593 

Case 10 5.000 -0.237 

Case 11 4.000 -1.068 

Case 12 5.000 -0.237 

Case 13 5.000 °0.237 

Case 14 6.000 0.593 


- 


AGESTAND now has standardized AGE values, with mean 0 and stan- 
dard deviation 1. Remember that standardizing does not change the 
shape of your data. If the data are highly skewed or bimodal before stan- 
dardizing, they will be so after. Standardizing simply moves the location 
and spread of your values. 
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9 Subgroup processing 





Command reference 128 
Built-in grouping variables 129 
9.1 Printing the last case in a file 130 
9.2 Computing subgroup means 131 
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Subgroup processing 9 





Overview 
This chapter shows how to process subgroups of data using DATA’s 
built-in variables BOF (Beginning Of File), EOF (End Of File), BOG 
(Beginning Of Group), and EOG (End Of Group). 


Note that to pick out only one subgroup in DATA, you can use either 
4 IF... THEN or DELETE. Furthermore, every statistical module has BY 
“ and SELECT to create temporary subgroups. 
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Command reference 
a ee 


BY varlist Activates the two system variables BOG 
and EOG (Beginning Of Group and End 
Of Group). 
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Built-in grouping variables 





You can write programs that operate on subgroups of cases in a file. To 
do this, you must first specify variables in your file that define 
subgroups. DATA has four special grouping variables that are always 
available for processing subgroups: 


BOF has value 1 if beginning-of-file, else it is 0 

EOF has value 1 if end-of-file, else it is 0 

BOG has value 1 if beginning-of-BY group, else it is 0 
EOG has value 1 if end-of-BY group, else it is 0 


The BY statement identifies the variables that define subgroups with 
BOG and EOG in your data. You may name up to 10 variables in a BY 
statement. You must sort your file by these variables. To clear a previous 
BY command, type BY with no arguments. 


Note that BY does zor cause subsequent commands to be executed on 
subgroups the way By groups... from the Data menu does. Instead, it 
specifies which variable or variables are used for BOG and EOG. 


You may use BOG, EOG, BOF, and EOF within conditional 
expressions in IF... THEN statements. For example, the statement: 


He. IF BOG THEN statement 
causes SYSTAT to execute statement every time it encounters a new 


value in a BY variable. This is because the value of BOG is 1 (“true”) for 
every case that begins a new group and 0 otherwise. 
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Subgroup processing Built-in grouping variables 


2, The following procedure prints the value of CARDIO for the last case 


Printing the last in the USDATA file. EOF is 0 for every case but the last, where its 
case in a file value is 1. 


USE USDATA 
IF EOF THEN PRINT, 


“The CARDIO value for the last case is", CARDIO 
RUN 


To print all but the last case, you can set the condition to one of the 
following: 


LP EOF=0: THEN ss 
IF NOT EOF THEN ... 
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Built-in grouping variables 


9.2 
Computing 
subgroup means 
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Subgroup processing 


The data set USDATA is sorted on the variable REGION. There are 4 
regions and therefore 4 BY groups. This program calculates the mean of 
the variable SPIRITS for each region. The HOLD command enables 
the program to sum across cases. On the last case for each group, 
SYSTAT calculates and prints the mean, then resets SUM and N to 0. 


USE USDATA 
BY REGION 
HOLD 
LET N=N+1 
LET SUM=SUM+SPIRITS 
IF EOG THEN FOR 
LET MEAN=SUM/N 
PRINT "The mean spirits consumption rate" 
PRINT " for Region”,REGION,” is”,MEAN 





LET SUM=0 
LET N=0 
NEXT 
RUN 
The mean spirits consumption rate 
for Region 1.000 is 3.149 
The mean spirits consumption rate 
for Region 2.000 is 2.203 
The mean spirits consumption rate 
for Region 3.000 is 2.256 
The mean spirits consumption rate 
for Region 4.000 is 2.785 
50 cases and 46 variables processed. 
SYSTAT file created. 
131 
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Programming examples 10 





Overview The examples in this chapter show more applications of SYSTAT 
) BASIC, including statistical calculations and data management proce- 
dures. There are simpler ways to accomplish many of these tasks, par- 
ticularly generating random numbers, but the programs in this chapter 
were selected to illustrate the full range of SYSTAT BASIC capabilities. 


The chapter also introduces the HOLD command, which retains vari- 
ables in memory for successive operations. HOLD enables you to do 
sums and counts of your data across cases, create lag variables, and exe- 
cute complex data transformations. 


eH Bk 
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133 





Command reference 


—_ ee eeeeeeeeeeeSsSseseseses 


ARRAY array / varlist 


Aliases the variables in varlist to an array 
of subscripted variables. The variables 
have the root name array with integer 
subscripts 1 through n, where n is the 
number of variables in varlist. Note that 
ARRAY works differently in versions prior 
to 3.2. See Example 10.2 for demonstra- 
tion. 


eee 


HOLD 


Initializes all numeric values in a BASIC 
program to zero and retains numeric val- 
ues from one case to the next. HOLD 
stays in effect until you QUIT the pro- 
gram, type NEW, or USE another file. 





RSEED=# 
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Specifies the random number seed #. The 
default is 313. You can specify any inte- 
ger between 1 and 30,000. 
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Advanced applications 
TSEC, EOE SUAS ESS ae ne 


Operations The following examples show you how to use the array capabilities in 
within rows SYSTAT BASIC to perform operations within rows (cases) of a dataset. 
10.1 The following program computes the average of the variables X(1) 
Computing through X(10) for each case. The program checks for missing data. You 
means of can calculate the mean more easily with the multi-variable function 
subscripted AVG. 
variables 
USE MYDATA 
SAVE NEWDATA 
a LET SUMX=0 
, LET N=0 


FOR I=1 TO 10 
IF X(I)<>. THEN FOR 
LET SUMX=#SUMX+X(T) 
LET N=N+1 
NEXT 
NEXT 
IF N<>O THEN LET MEAN=SUMX/N 
ELSE LET MEAN=. 
RUN 


ake dia wee toa | oe Se 


Like all BASIC programs, the program runs once for each case. 


At the start of each case, LET SUMX=0 and LET N=0 set the variables 
SUMX and N to zero. SUMX sums the non-missing values across each 
case, and N counts the non-missing values for each case. 


The FOR...NEXT loop runs the variables X(1-10) through two condi- 
tional transformations. In the first, if X(D is not missing, its value is 
added to SUMX. In the second, again if X(1) is not missing, the count 


variable N is increased by one. 





EI Mh oe RRR RRS SRR a aT: 
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10.2 
Computing 
means of 
unsubscripted 
variables 


Generating 
random 
numbers 
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Upon completion of the FOR... NEXT loop, another conditional 
transformation tests whether N is not equal to zero. If N is not equal to 
zero, then the calculation MEAN=SUMXN is executed. If N equals 
zero (because the values of X(1-10) for the current case are all missing), 
dividing by N would cause an error. Therefore, SYSTAT executes the 
ensuing ELSE statement and sets MEAN to missing (.). 


To average variables that are not subscripted, use the ARRAY command 
to alias the variables with a subscripted variable and then use the same 
logic as above. The example below averages the values of the liquor con- 
sumption variables in the data set USDATA. Although USDATA has 
no missing values, the program tests for them anyway. 


USE USDATA 
SAVE NEWDATA 
ARRAY LIQUOR/SPIRITS,WINE,BEER 
LET SUMALCOH=0 
LET N=0 
FOR I=1 TO 3 
IF LIQUOR(I)<>. THEN FOR 
LET SUMALCOH=SUMALCOH+LIQUOR(TI) 
LET N=N+1 
NEXT 
NEXT 
IF N<>O THEN LET MEAN=SUMALCOH/N 
ELSE LET MEAN=. 
RUN 


SYSTAT treats each variable specified in the ARRAY statement as an 
element in a subscripted variable named LIQUOR. 7 § 


SPIRITS = LIQUOR(1) 
WINE = LIQUOR(2) 
BEER = LIQUOR(3) 


SYSTAT contains many built-in random number generators. These can 
generate random numbers with a uniform distribution, standard normal 
distribution, t distribution, F distribution, chi-square distribution, Beta 
distribution, or Gamma distribution. See the Transforming variables 
chapter for the names of these functions. 
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10.3 

Uniform 
distribution on 
(0,1) 
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Use the built-in function URN to obtain uniformly distributed random 
numbers. SYSTAT generates uniform random numbers between zero 
and one by a triple modulo method. Each uniform is constructed from 
three multiplicative congruential generators with prime modulus. The 
initial seeds for each generator are 13579, 12345, and 313 (Wichmann 
and Hill, 1982). You may reset the last random number seed by using 
the RSEED command, where # is an integer between one and 30,000. 


RSEED=# 


Use the built-in variable ZRN to obtain normally distributed random 
numbers. ZRN generates pseudo-random standard normal variates with 
a mean of 0 and a standard deviation of 1. SYSTAT generates normal 
random numbers from uniforms by applying the inverse normal cumu- 
lative dismibution function to uniform variates between 0 and 1. 


The following examples use SYSTAT BASIC to generate random num- 
bers for a variety of distributions using only the uniform and normal 
generators. Each example demonstrates how to generate a sample from 
a different type of random distribution. 


The REPEAT 100 command in each program tells SYSTAT to create 
100 cases. Change this value to vary the number of cases in the gener- 
ated samples. 


This program creates a variable X that contains uniform random num- 
bers between zero and one. 


SAVE URANDOM 


REPEAT 100 
LET X=URN 
RUN 
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10.4 
Uniform 
distribution on 


(a,b) 


10.5 
Uniform integers 


10.6 

Normal 
distribution using 
URN 
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This program generates a variable U that contains uniform random 
numbers between A and B. 


SAVE URANDOM 
REPEAT 100 

LET A=0 

LET B=10 

LET U=A+(B-A)*URN 
DROP A,B 

RUN 


This program generates a variable I that contains uniform random inte- 
gers between and including A and B. 


SAVE IRANDOM 

REPEAT 100 

LET A=1 

LET B=9 

LET ImA+INT(CURN*(B-A+1) ) 
DROP A,B 

RUN 


Using uniform random numbers, this program creates a variable X that 
contains random numbers with a mean of 0 and a standard deviation of 
I. It uses a modification of the Box-Muller method (Box & Muller, 
1958; Marsaglia, 1961; Sibuya, 1962). The built-in function ZRN is, of 
course, faster. See the next example. 


SAVE NRANDOM 
REPEAT 100. 

10 LET X = 2*URN - 1 

20 LET Y = 2*URN - 1 

30 LET XY = X*xX+y*y 

40 IF XY>=1 THEN GOTO 10 

50 LET Z = SQR(-2*LOG(XY)/XY) 
60 LET NRAN1 = X*Z 

70 LET NRAN2 = Y*Z 

80 STOP 

RUN 
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10.7 

Normal 
distribution using 
ZRN 


10.8 

Normal 
distribution with 
specified 
parameters 


10.9 
Chi-square 
distribution 


© 1989, SYSTAT, Inc. 








Programming examples 


This program creates a variable X that contains random numbers with a 
mean of 0 and a standard deviation of 1. It uses the built-in normal ran- 
dom variate function. 


SAVE NRANDOM 
REPEAT 100 
LET X=ZRN 
RUN 


This program generates a variable Z that contains normal random num- 


bers with mean MU and standard deviation SIGMA. 


SAVE ZRANDOM 


REPEAT 100 

LET MU=va/ue 

LET SIGMA=value 
LET Z=MU+SIGMA*ZRN 
DROP MU,SIGMA 

RUN 


This program generates a variable CHISQ that contains a chi-square 
distribution with NDF degrees-of-freedom. The XRN function (see the 
Transforming chapter) is faster. 


SAVE CRANDOM 
REPEAT 100 
LET NDF=10 
LET CHISQ=0 
FOR I=1 TO NDF 
LET CHISQ@CHISQ+ZRN%2 
NEXT 
DROP NDF 
RUN 
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10.10 This program generates a variable T that contains a t distribution with 

t distribution NDF degrees-of-freedom. It does this by taking the ratio of a normal to 
the square root of a chi-square divided by its degrees-of-freedom. The 
built-in function TRN is faster. 


SAVE TRANDOM 
REPEAT 100 
LET NDF=10 
LET CHISQ=0 
FOR I=1 TO NDF 
LET CHISQ=CHISQ+ZRN*2 
NEXT 
LET T=ZRN/SQR(CHISQ/NDF) 
DROP NDF,CHISQ 





RUN 
10.11 This program generates a variable F that contains an F distribution with 
F distribution MDF and NDF degrees-of-freedom. It does this by taking the ratio of 


two chi-squares divided by their degrees-of-freedom. The built-in func- 
tion FRN is, of course, faster. 


SAVE FRANDOM 
REPEAT 100 
LET MDF=2 
LET NDF=10 
LET CHISQ1=0 
LET CHISQ2=0 
FOR T=1 TO MDF 

LET CHISQ1I=CHISQ1+ZRN*%2 
NEXT . 
FOR I=l TO NDF 

LET CHISQ2=CHISQ2+ZRN‘2 
NEXT 
LET Fe(CHISQ1/MDF)/(CHISQ2/NDF) 
DROP MDF ,NDF,CHISQ1,CHISQ2 
RUN 


140 © 1989, SYSTAT, inc. 





ere ie et 3 
Advanced applications Programming examples 
10.12 Here is how to generate multinormal random variables with known 


Multinormal covariance matrix. 

random variables 
1) Input the population covariance matrix with DATA, specifying 
TYPE=COVARIANCE; or enter it in the Data Editor, choosing 
“Covariance” in the Editor/Formats... dialog box. 


2) Obtain principal components for the matrix with Factor/Principal 
components... 


3) Generate normal random numbers in DATA and multiply them by 
the factor loadings. Here is an example: 


DATA 

INPUT ABC 
TYPE COVARIANCE 
SAVE COVA 

RUN 

2 

lL. 3 

io aw <4 


fi | ‘Wt 


FACTOR or ¢ Select Factor/Principal components... from the 
FACTOR Stats menu 
@ Click OK 





eS DATA 

3 REPEAT 1000 (or whatever N you want) 
SAVE NORRAN 

LET Z1l=ZRN 

LET Z2=ZRN 

LET Z3=ZRN 

LET Fl= .666 * Z1 + .908 * Z2 + .856 * 73 
LET F2=1.379 * Z1 + .766 * 22 - .716 * Z3 
ae LET F3"1.742 * Z1 - .953 * Z2 + .240 * Z3 
o.. DROP Zl, 22, Z3 

RUN 


». goa.’ 
BF le hg 
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The numbers in the three equations yielding F1-F3 are the “component 
loadings” printed in the Factor output. If you have many variables, you 
may want to use subscripts in the above example. You can use the BY 

_ command to generate multiple samples for a simulation. 


‘4 
id 
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To check your work, use Corr to obtain the covariance matrix of the 
variables in the generated sample: 


CORR or © Select Corr/Covariance... from the Stats menu 
USE NORRAN @ Click OK 
COVARIANCE 


Here is the output for our example: 


Covariance Matrix 
Fl F2 F3 


FL 1911 
F2 1.003 3.010 
F3  .634 1.535 4.142 


Notice that these are close, but not exactly equal to the population 
covariances because this sample is finite (1000). 


Selecting The examples below illustrate two methods of taking random samples 
random without replacement from data files. The first extracts a percentage of ‘ 
subsamples cases from a file, and the second a specific number of cases. : 
10.13 To pick a random sample of approximately three-fourths of a file, type: 
Selecting a 
percentage of USE USDATA 
cases 

t 


SAVE NEWDATA : 
IF URN>.75 THEN DELETE 


RUN 












To vary the sample size, change the .75 proportion to another number 
between 0 and 1. 


Here is another method which keeps both selected and deselected cases 
in the same file. The WEIGHT variable can be used with statistical 
procedures to select the random subsample for cross-validation. 


USE USDATA 
SAVE NEWDATA 
IF URN>.75 THEN LET WEIGHT=0 


ELSE LET WEIGHT=1 
RUN 
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10.14 
Selecting a 
specific number 
of cases 

3 

= 

— 

e Using the 

# HOLD 

7 command 
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To pick a random sample of an exact size from a file you must use the 
HOLD command, which is discussed in more detail in the next section. 
This program uses an algorithm due to Bebbington (1975). You should 
replace orig# with the number of cases in the original file and sample# 
with the number of cases you want in the sample. 


USE USDATA 

SAVE SDATAFIL 

HOLD 
IF CASE=1 THEN LET NF=ori gf 
IF CASE=1 THEN LET NS=samp] eff 
LET RAND=URN 
IF RANDONS/NF THEN DELETE 
ELSE LET NS=NS-1 


LET NF=NF-1 
DROP NF, NS, RAND 
RUN 


‘The HOLD command changes three default settings in SYSTAT. 


1) Without HOLD, SYSTAT operates on one observation at a time and 
then clears out its memory before repeating the program on the next 
observation (case). With HOLD, SYSTAT holds values in memory 
from one case to the next. 


2) Without HOLD, SYSTAT sets the initial values of new numeric 
variables to missing (.). With HOLD, new numeric variables have initial 
values of zero (0). 


3) Without HOLD, SYSTAT clears the workspace after it executes a 
RUN. With HOLD, SYSTAT does not clear the workspace. You can 
execute another program on the current file without issuing a USE 
command. Also, SYSTAT holds the values from the last case of the 
previous operation in memory. 


HOLD is useful for tasks where you need to use the results of a previous 
calculation in a subsequent task. You will need to use HOLD when 
changing values of a variable by a constant increment, summing one or 
more variables (columns), summing subgroups, counting subgroups, and 
performing other complex calculations. 
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HOLD is not a true BASIC command in that you cannot precede it 
with a line number. You can, however, use HOLD with any BASIC 
program. 


HOLD stays in effect until you exit DATA or use NEW. Keep this in 
mind if you run several programs during one DATA session. 


10.15 The following program creates variables X and Y; X increases by 3 for 
Incrementing each case and Y increases by 1 every five cases. The REPEAT 15 state- 
val a of a ment tells SYSTAT to execute the operation 15 times, thereby creating 
Velidie 15 cases. Notice that X is initialized as zero the first time through this 
program because of HOLD. 
SAVE INCREM 
HOLD 
REPEAT 15 
LET X = X+3 3 
LET Y = 1+INT((Case-1)/5) : 
LIST 
RUN 
X Y 
Case 1 3.000 1.000 
Case 2 6.000 1.000 
Case 3 9.000 1.000 
Case 4 12.000 1.000 
Case 5 15.000 1.000 
Case 6 18.000 2.000 
Case 7 21.000 2.000 
Case 8 24.000 2.000 
Case 9 27.000 2.000 
Case 10 30.000 2.000 
Case 11 33.000 3.000 
Case 12 36.000 3.000 
Case 13 39.000 3.000 
Case 14 42.000 3.000 
Case 15 45.000 3.000 


Without HOLD, SYSTAT would initialize X to missing. Therefore, all 
subsequent calculations done with X would result in missing values. 
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10.16 The following program prints the sum of the variable WINE from the 
Summing a data set USDATA. 
variable 

USE USDATA 

HOLD 


IF WINE<>. THEN LET WINESUM = WINESUM+WINE 
IF EOF THEN PRINT "Sum of Wine =", WINESUM 
RUN 


The first transformation includes the condition IF WINE<>. to make 
sure we do not add a missing value to WINESUM. If we did, 
WINESUM would become missing as well. The next transformation 
contains the condition IF EOF. EOF (end-of-file) is a built-in variable 
that is true if the current case is the last case in the file, but false 


otherwise. 
sum of Wine = 138.980 
- 50 cases and 44 variables processed. 
a . SYSTAT file created. 
10.17 The following program prints the sum of the variable WINE for all val- 
» Summing a ues of DIVISION greater than 3. It works like as the previous example 
| variable for except that it adds another condition to the summation statement: 
“selected cases 
USE USDATA 
HOLD 


IF DIVISION>3 AND WINE<>. THEN LET WINESUM=WINESUM+WINE 
IF EOF THEN, 
PRINT “Sum of Wine for Regions 4 through 9 
is", WINESUM 
RUN 


Sum of WINE for Divisions 4 through 9 is 91.620 


50 cases and 44 variables processed. 
SYSTAT file created. 
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10.18 
Counting cases 
meeting a 
condition 


10.19 
Standardizing a 
variable 
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The following program counts and prints the number of states in the 
USDATA data set where the average consumption of spirits per person 
is more than three gallons. 


This program resembles the one above that sums WINE for cases 
where REGION is greater than 3. In this program, however, instead of 
summing a variable, we add | to a counting variable COUNT for each 
case where SPIRITS is greater than 3. 


USE USDATA 

HOLD 

IF SPIRITS>3 THEN LET COUNT=COUNT+1 
IF EOF THEN PRINT ‘COUNT =‘, COUNT 
RUN 


SYSTAT tells you that you have not saved your work to a file and asks if 
you want to. If yes, hit Enter. SYSTAT responds: 


COUNT = 9.000 


50 cases and 44 variables processed. 
No SYSTAT file created. 


The following program standardizes a variable, taking advantage of all 
three features of the HOLD command: new variables are initialized to 
zero, values are held in memory from one case to the next, and the 
workspace is not cleared after a RUN. 


Note: you could standardize data more easily with Data/Standardize... 
with the Data Editor or the STANDARDIZE command in DATA. 
This example merely illustrates HOLD. 


Three RUNs execute the procedure. The first creates a data file, the 
second computes the sum (SUM), sum of squares (SUMSQ), and total 
number of cases (N), and the third rereads the data and standardizes it. 
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NOTE 'This step creates a file of the raw data.' 
SAVE RAW 

INPUT X 

RUN 

9999991 

9999992 

9999993 


3 cases and 1 variables processed. 
SYSTAT file created. 


NOTE ‘This step computes sum, sum of squares, and 
count. ' 

USE RAW 

HOLD 

LET SUM=SUM+X 

LET SUMSOQ=SUMSO+X*X 


4 


& 

: LET N=CASE 

= LIST X.N 

RUN 

= U 

é X N 

t Case 1 9999991 .000 1.000 
ea Case 2 9999992 .000 2.000 
a Case 3 9999993. 000 3.000 
j é 3 cases and 4 variables processed. 
: i SYSTAT file created. 
i 


NOTE ‘This step rereads the data and standardizes.’ 
SAVE STANDARD 
IF CASE=1 THEN FOR 
LET MEAN = SUM/N 
LET SD = SQR((SUMSQ-SUM*MEAN)/(N-1)) 
NEXT 
LET Z = (X-MEAN)/SD 
LIST Z 
RUN 
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Zz 
Case 1 -1.000 
Case 2 0.000 
Case 3 1.000 
3 cases and 6 variables processed. 


SYSTAT file created. 


Note that HOLD initializes SUM and SUMSQ to zero. If we do not 
use HOLD, they are initialized to missing (.). Also, because we used 
HOLD, SYSTAT holds the values for SUM and SUMSQ in memory 
from one case to the next. Without HOLD, SYSTAT treats the cases 


independently and cannot execute any summations. 


In the third step, we calculate MEAN and SD only for CASE=1 because 
the same values will be used to standardize all cases. SYSTAT retains 
their values from the prior RUN because of the HOLD command. 


The built-in function STANDARDIZE does the same work more effi- 
ciently. This example, however, gives us a chance to point out another 
feature of SYSTAT. If your data have large means, small standard de- 
viations, and many observations, the “desk calculator” formulas used 
above can cause round-off errors. Since SYSTAT does its arithmetic in 
double precision, this should rarely happen. Even with the nasty exam- 
ple above, we can get away with the desk calculator formula because we 
have at least 15 decimal digits of precision. 


Here is a “provisional” algorithm for standardizing. Note that SUMSQ 
is no longer the total sum of squares. MEAN and SUMSQ accumulate 
variation about a provisional mean; after the last observation is pro- 
cessed, that variation becomes the actual mean. This algorithm is used 
in all SYSTAT procedures that require sample moments. 


SAVE RAW 
INPUT X 
RUN 
999999] 
9999992 
9999993 


~~ 
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HOLD 

USE RAW 

LET WIT = 1-1/CASE 

LET XS = X-MEAN 

LET MEAN = MEAN+XS/CASE 
LET SUMSQ = SUMSO+WT*XS*XS 
LET N = CASE 

LIST X,N 

RUN 


SAVE STANDARD 

IF CASE = 1 THEN LET SD=SQR(SUMSQ/(N-1)) 
LET Z = (X-MEAN)/SD 

LIST Z 

RUN 


i. 


The HOLD command does not limit SYSTAT to two-pass transforma- 
tions. Indeed, it is possible to make three or more passes on the same 
file for smoothing, lagging variables, or other procedures. Remember, 
however, to try your program on a few cases to debug it before commit- 
ting it to the entire data file. 


4 


10.20 You can use HOLD to create lagged variables. A lag is a copy of a vari- 

Lagging variables able offset by a number of cases. Lags are used in time series and fore- 
casting to see if a variable is auto- or self-correlated. The built-in func- 
tion LAG does the same‘operation as the example below. In the follow- 
ing example, LAG is a lag of X offset by one case: 


; 
7 


LAG 


otk 
OOMDAhNM x 
ODAN. 
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First, make a file with X. 


SAVE FILE] 
INPUT X 
RUN 

2 

4 

6 

8 

10 


~ 


Now, create LAG. 


NEW 

USE FILE1 

SAVE FILE2 

HOLD 

10 LET LAG=TEMP 

15° IF CASE=1 THEN LET LAG. 
20 LET TEMP = X 


DROP TEMP 
LIST X LAG 
RUN 
X LAG 
Case 1 2.000 : 
Case 2 4.000 2.000 
Case 3 6.000 4.000 
Case 4 8.000 6.000 
Case 5 10.000 8.000 
5 cases and 2 variables processed. 


SYSTAT file created. 


Note that the DROP command prevents TEMP from being saved into 
FILE2. 


The program works as follows: 


LAG and TEMP are new variables with initial values of 0. 
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Case 1. The first value of LAG is set to missing. TEMP stores X’s value 
for Case 1, which is 2. 


Case 2. Because we are using HOLD, SYSTAT remembers the value 
for TEMP (which is 2). The TEMP value is given to LAG’s second 
case. Thus, the second value for LAG is now equal to the first value of 
X. Next, SYSTAT moves the second value for X (4) into TEMP. 


Case 3. The TEMP value (4) goes into LAG, and the case 3 value of X 
a (6) goes into TEMP. 


Case 4. The TEMP value (6) goes into LAG, and X’s value (8) goes 
into TEMP. 


The process continues until the last case is reached. The final X value 
goes into TEMP and is never used. The final LAG value is the penulti- 


i mate X value. 

= 10.21 This example shows you how to save the last m cases of an existing file to 
g Saving the lastn —_a new file. The program is useful if you regularly add data to a file and 
* cases of a file wish to do analyses on only the most recent cases. 

® The operation is done in two parts. The first establishes the total num- 


ber of cases in the file (N). SYSTAT holds the end value of N from the 
first part in memory. It uses this value in the second part to determine 
which cases to save into the new file. 


The program below saves the last 10 cases from USDATA into 
NEWFILE. Include the LIST REGION statement in the second part 
to show the cases that SYSTAT saves. 

















USE USDATA 
HOLD 

LET N=CASE 
RUN 


SAVE NEWFILE 
IF N-CASE>=10 THEN DELETE 
LIST REGION 
RUN 
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REGION 
Case 4] 4.000 
Case 42 4.000 
Case 43 4.000 
Case 44 4.000 
‘Case 45 4.000 
Case 46 4.000 
Case 47 4.000 
Case 48 4.000 
Case 49 4.000 
Case 50 4.000 


10 cases and 44 variables processed. 
SYSTAT file created. 


SYSTAT has saved the last ten cases for all the variables into 
NEWFILE. 


10.22 This program calculates the mean of the variable SPIRITS for each of 

Subgroup means the four regions in USDATA. On the last case for each group, we ask 
SYSTAT to calculate and print the mean, and reset SUM and N to 0. 
Of course, the same operation can be performed by using Data/By 
groups... and Stats/Statistics... In either case, the file must be sorted 
on the REGION variable when used with BY. 


USE USDATA 
BY REGION 
HOLD 
LET N=N+1 Z 
LET SUM=SUM+SPIRITS 
IF E0G THEN FOR 
LET MEAN=SUM/N 
PRINT "The mean spirits consumption rate” 
PRINT “ for Region", REGION,” is", MEAN 
LET SUM=0 
LET N=0 
NEXT 
RUN 
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The mean spirits consumption rate 


for Region 1.000 is 3.149 
The mean spirits consumption rate 

for Region 2.000 is 2.203 
The mean spirits consumption rate 

for Region 3.000 is 2.256 
The mean spirits consumption rate 

for Region 4.000 is 2.785 


390 cases and 46 variables processed. 
No SYSTAT file created. 


BY REGION sets REGION as the grouping variable for the IF EOG 
THEN FOR statements. After each case that is the last case in a 
REGION group (the last region | case, the last region 2 case, etc.), 
SYSTAT computes a mean from the sum it has computed for the group. 


The BY command and the EOG variable are discussed in more detail in 
the previous chapter, Subgroup processing. 
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APPEND joins files vertically 

ARRAY Aliases variables to an array of subscripted variables 

CODE Recodes values 

DELETE Deletes current case from the file that is saved 

DIAGONAL Signals that the diagonal is missing from the triangu- 
lar matrix being input 

DIM Creates new subscripted variables 

DROP Drops variable from the file that is saved 

ELSE Executes subsequent statements if the previous IF test 
was not met 

ERASE Erases lines from a SYSTAT BASIC program 

FOR...NEXT Begins a loop or statement group 

GET Reads data from plain text file 

GOTO Detours to a specific line in a BASIC program 

HOLD Retains values from one operation to the next 

IF... THEN Executes operation if condition is met 

INPUT Inputs text data in a fixed-format or free-format 

LABEL Creates value labels 

LET Assigns a value to a variable 

LIST Displays values of the variables you specify 

MAC Saves data in a plain text file with tab delimiters 

NEXT Ends a FOR...NEXT loop 

PRINT Prints variables and/or character strings 

PUT Saves data in a plain text file with comma delimiters 

RANK Replaces values in variable with ranks 

REPEAT Specifies the number of cases to process 

RSEED Sets random number seed 

RUN Runs a SYSTAT job 

SORT Sorts data 

STANDARDIZE Standardizes variables 

STOP Stops processing of a case 

TRANSPOSE _ Transposes a file 

TYPE Reads non-rectangular data 

USE Opens data file, or joins two files horizontally 


See Chapter 6 for information on built-in functions and variables. 


Eight global commands—FORMAT, HELP, NOTE, OPTIONS, PAGE, 
SELECT, SUBMIT, and QUIT—are discussed at the beginning of the 


reference. 
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Overview This appendix presents syntax information and summaries of all the 
DATA commands in alphabetical order. We identify the chapter(s) that 
introduce each command in the left margin. 


The opposite page shows all the DATA commands with brief explana- _ 
tions. 


The command reference begins with a list of common commands that 
are available globally—with SYSTAT, SYGRAPH, and DATA. They 
might be useful in DATA. These commands are also shown in the 
Command reference appendices of the SYSTAT and SYGRAPH 


volumes. 


Ps 
= 
*.- 
i} 
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Command reference 


Global 
commands 


FORMAT=n 


/UNDERFLOW 


HELP command 


Specifies the number of decimals to dis- 
play in output (0 < n< 9). The default is 
3. COLD. 


Prints tiny numbers that otherwise would 
appear as "0” in exponential notation. 


Displays help for specified command. 
HOT. 





NOTE=n | ‘line!’ ["line2 
vie) 


i 


OPTIONS 


156 


Prints any note (character string) in the 
text output, or the printer or file if output 
is redirected. HOT. 


NOTE can print ASCII characters by their 

index; specify the index number without 
quotation marks. For instance, NOTE=13 ; 
puts a carraige return in the output. (You 3 
can specify both an index and a character 

string in a single line.) HOT. 


Displays options currently in effect. Not 
available in the Data Editor. HOT. 
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PAGE Selects screen display and printer output 
format and characteristics with the op- 
tions that follow. COLD. 


/FILE=n Specifies the number of lines per page in 
a text file. The default value is 56. Specify 
0 for no page breaks. 


PRINTER=n Specifies the number of lines per printed 
page. The default value is 56. Specify 0 
for no page breaks. 


- NARROW | WIDE Specifies the number of columns used for 
analysis (text) output. NARROW, the de- 

fault, specifies 80 columns; WIDE speci- 

fies 132. 


TITLE="JineT', 'line2', Specifies a title for each page of output. 
He You may specify up to 10 title lines; each 
line may be up to 132 characters long. 
SYSTAT centers each line on the page. 


SELECT exprni [... ] Selects a subgroup of cases for analysis. 
You may include up to 10 expressions. 
Only cases meeting all the expression 
conditions are used. Using the SELECT 
command without an argument ends 
current selection conditions. COLD. 


SUBMIT "filename " Reads and executes commands contained 
in filename. Filename must be a text file. 
HOT. 
/ECHO Displays the commands in filename as 


they are processed. 
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QUIT [= * | @| Exits SYSTAT to the Finder. The optional 
"filename" arguments let you output the command 

log to a specific destination. Asterisk (*) 
sends the command log to the screen 
only. At-sign (@) sends the command log 
to a printer and the screen. "Filename" 
sends the command log to the screen and 
a text file called filename. Not available 
in the Data Editor. HOT. 


erat te 


DATA APPEND file! file2 Creates a new file (named by a SAVE 
commands command) by appending cases of file2 at i 
Chapter 5 the bottom after cases of file?. Both files 


must contain the same variables, in the 
same order, but they can have different 
numbers of cases. You must use SAVE 
before APPEND, which is HOT. 


Chapter 10 ARRAY array / varlist Aliases the variables in varlist to an array 
of subscripted variables. The variables 
have the root name array with integer 
subscripts 1 through n, where n is the 
number of variables in varlist. Note that 
ARRAY works differently in versions prior 





to 3.2. 
Chapter 6 CODE varlist / : Recodes the variables listed in varlist. For 
| oldi=new!, all the variables in varlist, any case with 
old2=new2, ..., value old7 is replaced with value new7. | 
oldp=newp All occurrences of old2 are replaced with | 
new2, etc. 4 


All variables in varlist must be the same 
type (character or numeric), and the oldp 
and newp values must correspond to the 
variable type. Surround character strings 
with single or double quotation marks. 





Chapter 5 DELETE : Prevents the current case from being writ- 
ten to the SAVE file. Usually, use DELETE 
with an IF...THEN command. 
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Appendix |: Command reference 


Chapter 3 DIAGONAL=PRESENT | Specifies whether the matrix you are en- 
ABSENT tering has values in the diagonal cells. 


The diagonal is assumed to be present 
unless you state otherwise with 
DIAGONAL=ABSENT. 





Chapter 7 DIM vartn) Reserves space for a new variable var 
with subscript n, where n is an integer be- 
tween 1 and 99 inclusive. 








Chapter 5 DROP varlist Prevents the variables given by varlist 
from being written to the file named by 
SAVE. 
Chapter 7 ELSE statement Can follow an IF... THEN command. 








Statement is executed when the IF exprn 
evaluates as false. The statement can be 
any valid command, including DELETE or ~ 
another IF... THEN command. 


’ Chapter 7 ERASE n7[-n2] Erases all numbered BASIC statements 
from n7 to n2, inclusive, or erases the 
line numbered n7, if a single number 
(rather than a range) is specified. The de- 
fault, if no range is specified, is all num- 
bered statements. 


Chapter 7 FOR [index=n1 TO n2 Starts a FOR...NEXT loop. Index must be 

[STEP=n3]] ... NEXT a numeric variable, either from your file 
or a new variable. You must specify n7, 
but n2 is optional. You may optionally 
specify an increment value with the 
STEP=n3 phrase; the default is +1. You 
may specify any real number for n7-3. 
See text for instructions on using 
FOR...NEXT with-an without an index. 










Chapter 3 GET filename Reads the ASCIl (plain text) file filename. 
“a 
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Chapter 7 GOTO n Detours the program to the statement 
numbered n. You must have numbered 
line statements in your program to use 
GOTO. 





Chapter 10 HOLD | Initializes all numeric values in a BASIC 
program to zero and retains numeric 
values from one case to the next. HOLD 
stays in effect until you QUIT the 
program, type NEW, or USE another file. 


Chapters 6, 7 IF exprn THEN statement —_ Executes statement if the exprn evaluates 
as true. Exprn may be any valid 
expression formed with numbers, 
variables, operators, and functions. 
Statement may be any valid command, 
including DELETE. You can follow 
IF... THEN constructions with ELSE (see). 





Chapter 3 INPUT varlist Names the variables (and indicates order) 
that will be read into SYSTAT. You may 
identify a range of variables in varlist 
using subscript notation. 


For fixed-format input, INPUT has two 
arguments, each enclosed in parentheses: 
. _ INPUT (varlist (format). As above, varlist 
indicates the variable names, in order. 
Format is a format description in special 
notation, discussed in this chapter. 


For free-format input (first syntax above), 
place a backslash after varlist to force 
SYSTAT to start a new case for each line 
of data and to use every value entered in 
each row, even if it must start filling new 
cases to do so. See Example 3.8. 
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Chapter 6 LABEL varlist / Creates a character variable for each 
oldi=label1, numeric variable in varlist. Varlist can 
old2=label2, ..., contain numeric variables only. For each 
oldp=labelp numeric variable, a character variable 


with the same name plus $ is created, 
with values as given by oldi=labeli. If any 
character variable already exists, its 
values are replaced. 





Chapter 6 LET var=exprn Assigns the value of exprn to the variable 
var. You may use either a numeric or 
character variable. Character values must 
be surrounded by single or double 
quotation marks. 





Chapter 2 LIST [varlis¢] Lists the contents of the file named by the 
USE statement. Varlist is an optional list 
of variables for viewing only a portion of 
the file. Note that LIST replaces the 
CASELIST command of versions prior to 


3-2. 


Chapter 4 MAC filename Saves data in a SYSTAT data file into a 
plain text (ASCH) file with tab delimiters. 


Chapter 7 NEXT Ends a FOR...NEXT (see) loop. 


Chapters 2, 7 “PRINT varlist | ‘string’ Displays the values of the variables listed 
in varlist, or displays the character string 
you specify. Varlist may include numeric 
or character variables. 


Chapter 4 PUT filename Saves data in a SYSTAT data file into a 
. plain text (ASCII) file. 


Transforms all numeric variables in varlist 
to ranks. Each variable is ranked within 
its own distribution. The default is all 
numeric variables in the file. 


Chapter 8 RANK varlist 
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_ Appendix I: Command reference 


Chapters 2, 7 REPEAT n Applies subsequent commands to the first 
n- cases in the file. When you first use a 
file, the default for REPEAT is the number 
of cases in the file. Otherwise, the default 
nis 0. 


Chapter 10 RSEED=n Specifies the random number seed n. The 
default is 313. You can specify any 
integer between 1 and 30,000. 


Chapter 2 RUN Sets a DATA procedure in motion. HOT. 


Chapter 8 SORT varlist Sorts the datafile on the variables 
specified in varlist. Varlist can include 
numeric or character variables or both. 
The default is all variables in the file, in 
the order that they appear in the file. 





Chapter 8 STANDARDIZE varlist Standardizes the numeric variables 
named in varlist. The default is all 
numeric variables in the file. 





Chapter 7 STOP Stops execution of a BASIC program. 





Chapter 5 TRANSPOSE * _ Transposes a data file by turning rows 
(cases) into columns (variables) and vice 
versa. You can only transpose files with 
numeric data. TRANSPOSE can handle a 
maximum of 99 cases (before 
transposing). 





Chapter 3 TYPE = Specifies the type of matrix you are 
RECTANGULAR | entering. Use DIAGONAL=ABSENT if the 
SSCP | COVARIANCE | __ diagonal values are missing. 
CORRELATION | 
DISSIMILARITY | 
SIMILARITY 
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Appendix I: Command reference 


Chapters 2, 5 USE filename [(varlist)] Retrieves the file filename. If you include 
the optional varlist, USE retrieves only the 
specified variables from the SYSTAT file 
filename. 


USE file? [(varlist)] file2 Brings both file? and file2 into the active 
{(varlist)] workspace. You can merge these files 
into a single third file. Use the optional 
varlists if you want to merge only 
portions of the file(s). 





wu 
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Appendix 


li: SYSTAT file structure 


SYSTAT data files have a simple structure. Data are stored in cases by . 
variables format, with each case written as a single unformatted | 
FORTRAN record. Triangular matrices are stored in the same form 
(i.e. as many cases as variables) with entries above the diagonal replaced 


by missing values. 


Records First record 


The first record contains three integer variables: Version, Release, and 
Mod. For Version 3.0 and 4.0, these are 30, 0, and 0, respectively. 
Notice that NV, MTYPE, and NTYPE are used to read these variables 
in subroutine GETLAB below. This allows Version 2 and later to read 
Version 1 files. 


Second record 


The second and later records contain comment fields, one record per 
comment. These records are terminated by a record that begins with a 


$. 


Third record 
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The next record contains three integer variables. NV is the number of 
variables in the file. MTYPE is the type of file. The values of MTYPE 


are: 


Rectangular data 
SSCP matrix 
Covariance matrix 
Correlation matrix 
Dissimilarity matrix 
Similarity matrix 


An bwh = 


You may extend these values to accommodate other types, but inform 
SYSTAT for compatibility. 


The third variable, NTYPE, specifies the precision of the numerical 


data in the file. Double precision (standard) is NI'YPE=2, and single 
precision is NT'YPE=1. 
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Subroutines 


166 


The next NV records contain the variable labels in the first 12 bytes of 
each record. On some operating systems with 128 byte or longer 
records, this wastes some space, but putting labels on separate records in 
the file simplifies variable subsetting algorithms. Labels are right justi- 
fied in the first 8 characters of the label field (i.e. leading blanks). The 
remaining 4 characters are reserved for subscripts. Character variable la- 
bels have no subscripts, and are identified by a $ in the 9th byte of the 
label. Otherwise, labels are for numerical variables. The subroutine 
GETLAB below reads NV, MTYPE, and NTYPE, fills the array LAB 


with NV labels, and records how many variables are of type numerical 


(ND) and character (NK). 


Remaining records 


The remaining records contain the data until the end of file marker. 
ND numerical and NK character variables are written in a single un- 
formatted WRITE. Character variable values are stored as 12 bytes, left 
justified (padded on the right with blanks). 


The subroutine GETLAB below reads the header information from the 
file. Once it has been called, you have the information necessary to read 
the records in the file with RSYS. GETLAB thus can be used to rewind 
a file to begin reading it again with RSYS. 


The subroutine RSYS below reads a record (case) from a SYSTAT file 
and places the values into the same order in DAT and KHR as they are 
in the LAB labels array. After RSYS is called to read a record, values for 
a character variable in column i of the LAB array can be found in col- 
umn i of KHR and values for a numerical variable in column j in the 
LAB array can be found in column 7 of DAT. If you are writing routines 
that process only numerical! data, you can call RSYS and ignore KHR 
(although be sure to dimension it correctly in the calling routine). RSYS 
filters character variables out of the file by putting them in KHR. 


The subroutine WSYS writes a record to a SYSTAT file. You must set 
ND to the number of numerical variables in LAB and NK to the num- 
ber of character variables in KHR. If you have no character variables, ser 
NK=0 and WSYS will write only numerical variables into the file. You 
may set ND=0 for writing only character variables. 
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Appendix Il: SYSTAT file structure 


DAT and KHR thus serve as I/O buffers whose contents correspond 
(column by column) with the labels in LAB. If you wish to use random 
access I/O, you can use WSYS to get sequential records from a 
SYSTAT file and save them into a direct access file with REC=N, where 
N is the current record number of the record just read from the 
SYSTAT file. Some of the SYSTAT routines do this for sorting and 
other tasks. 


Missing data are represented by a variable DMIS, which is -1.0D36. 
The single precision missing data value is RMIS, which is -1.0E36. All 
real arithmetic is bounded by + and — OFLO, which is 1.0D35. Single 
precision overflow is ROFLO, which is 1.0E35. Machine precision is 
EPS, which is 1.0D-15. Single precision machine precision is REPS, 
which is 1.0E-7. Greater precision is available on most machines, but 
these numbers insure common arithmetical bounds on all machines with 
double precision arithmetic. SYSTAT prints a maximum of 12 digits, so 
there is always at least (usually more) 3 digits of fuzz to allow for round- 
off errors. On MS-DOS machines, all INTEGER variables in these 3 
subroutines are INTEGER*2. On other machines, they are 
INTEGER"4. 
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Appendix Il: SYSTAT file structure 


CRORE EERIE SOISISISII IOI EIIEI IIIS SISO IIOIISISEISIO USS EISSN EI HoiEEEEs 


SUBROUTINE GETLAB (LAB, MTYPE ,NTYPE,KU,EOF,ND,NK, NY, MY) 
GET SYSTAT FILE HEADER INFO 

YOU MUST PREVIOUSLY HAVE OPENED FILE TO READ WITH: 

OPEN (KU, FILE=name, STATUS='OLD' , FORM="UNFORMATTED' ) 


C 

C 

C 

C 

¢ 

C LAB = VARIABLE LABEL ARRAY 

C MTIYPE = TYPE OF FILE CL“RECT, 2=SSCP,3=COVA, 4=CORR.5=SIMI.6=DISS) 
C NTYPE = NUMERICAL DATA TYPE (1=SINGLE PRECISION, 2=DOUBLE PRECISION) 
C KU = INPUT UNIT NUMBER 

C EOF = END OF FILE 

C ND = NUMBER OF DATA ITEMS (NUMERIC VARIABLES) IN RECORD 

¢ NK = NUMBER OF CHARACTER ITEMS (CHARACTER VARIABLES) IN RECORD 

C NV = ND+NK (TOTAL NUMBER OF VARIABLES PER CASE) 

: MV = MAXIMUM NUMBER OF VARIABLES IN FILE 

C 
C 
[ 
c 


NV IS USED IN FIRST READ TO REPRESENT VERSION 
MTYPE IS USED _IN FIRST READ TO REPRESENT RELEASE 
IF NTYPE IS NEGATIVE IN FIRST READ, VERSION .GE. 2 


LOGICAL EQF 
CHARACTER*1 LAB 
IF MS-DOS OR CP/M UNCOMMENT FOLLOWING LINE 
INTEGER*2 KVER,KREL, KMOD 
DIMENSION LAB(12,MV) 


COMMON /VERSN/ KVER,KREL,KMOD 


EOF=. TRUE. 
REWIND KU 
READ (KU, END=100.ERR=100) NV.MTYPE,NTYPE 
IF (NTYPE.GT.KMOD) GO TO 5 
IF (NV+MTYPE.GT.KVER+KREL) GO TO 100 
READ (KU,END=100,ERR@100) ((LAB(I,J),I=1,12),J=1,6) 
IF (LAB(1,1).NE.°$') GO TO 1 
DO 3 I=1,12 
DO 2 J=l,MV 
LAB(I,J)=" ° 
2 CONTINUE 
3 CONTINUE 
READ (KU,END@100,ERR™100) NV.MTYPE.NTYPE 
ND=0 
NK=0 
00 10 J=1,NV 
READ (KU, END-100,ERR=100) (LAB(I.J),I=1,12) 
IF (LAB(9,J).NE.'$') ND=ND+] 
; IF (LAB(9,J).E0.°S') NK=NK+1 
10 CONTINUE 
EOF=.FALSE. 
100 RETURN 
END 


a 7M OO 
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(C8 i ie ee ee ae ei ae He Se ae ie Hee te te te ae te ie Fe Fe ee ete Fe te ti Fe 90 ete ie tI Fe ee FI TOTO III ICI Ite 


SUBROUTINE RSYS (LAB, KHR,STA,DTA,NTYPE,KU,EOF,ND,NK,MV) 
READ SYSTAT FILE RECORD 


= VARIABLE LABEL ARRAY 
KHR = CHARACTER DATA ARRAY 
= SINGLE PRECISION DATA ARRAY 
DTA = DOUBLE PRECISION DATA (EQUIVALENCE TO STA IN CALLING ROUTINE) 
= NUMERICAL DATA TYPE (1*SINGLE, 2=DOUBLE) 
KU = INPUT UNIT NUMBER 
EOF = END OF FILE 
NO = NUMBER OF DATA ITEMS IN RECORD 
NK = NUMBER OF CHARACTER ITEMS IN RECORD 
MY = MAXIMUM NUMBER OF VARIABLES IN FILE 


DOUBLE PRECISION DMIS 
DOUBLE PRECISION DTA 
LOGICAL EOF 
CHARACTER*1 LAB. KHR 


DIMENSION LAB(12,MV),KHR(12,MV),STACMV) ,DTACMV) 


RMIS#-1.0E36 

OMIS=-1.0036 

NV=ND+NK 

EOF=. TRUE. 

IF (ND.£0.0) GO TO 6 

IF (NTYPE.EQ.2) GO Ae 5 

PRECISION FILE 
F (NK.GT.0) READ CKU. END=100, ERR=100) ane J~1,ND), 

* KHR(I,J), f-1,12),J=1,NK) 

IF Nuperee tg READ (KU,END=100,ERR=100) (STAC), J=1.NO) 


IM=ND-I+1 

IF (STACIM).NE.RMIS) DTACIM)=STAC IM) 
IF (STACIM).EQ.RMIS) OTACIM)=DMIS 

4 CONTINUE 


GO T0 7 
C DOUBLE PRECISION FILE DAT 
5 IF (NK.GT.0) READ (KU. END=100,.&RR=100) RUE J@1,ND), 
2 R(1.d). f=1.12) .d=1.NK) 
Ay ‘ao READ (KU,END=100,ERR=100) (DTAC)), J=1,ND) 


»~ € CHARACTER DATA ONLY IN FILE 
6 Enger ner see C((KHR(1,0),2=1.12) Jal .NK) 


C UNPACK 
7 IF (NK. as 0) GO TO 90 
NDD=ND+1 
NKK=NK+] 
DO 20 Jm1,NV 
JM=NV-J+1 
IF (LAB(9,JM).EQ.°$') GO TO 10 
NOD=NOD-1 
DTAC JM)=<DTACNOD) 
GO TO 20 
10 NKK=NKK-1 
DO 15 M=1,12 
KHR(M, JM)=KHR(M, NKK) 
CONTINUE 
8 CONTINUE 
90 EOF=.FALSE. 
100 RETURN 
END 
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SUBROUTINE WSYS (LAB, KHR,.STA,DTA,NTYPE,KU,ND,NK,MYV) 
WRITE SYSTAT FILE RECQRD (DATA MUST BE IN DTA) 


LAB = VARIABLE LABEL ARRAY 

KHR = CHARACTER DATA ARRAY 

tA = SINGLE PRECISION DATA ARRAY 

DTA = DOUBLE PRECISION DATA (EQUIVALENCE TO STA IN CALLING ROUTINE) 
vee = NUMERIC DATA TYPE (1=SINGLE, 2=DOUBLE) 

KU = OUTPUT UNIT NUMBER 

NO = NUMBER OF DATA ITEMS IN RECORD 

NK = NUMBER OF CHARACTER ITEMS IN RECORD 

MVo= MAXIMUM NUMBER OF VARIABLES IN FILE 


DOUBLE PRECISION DMIS 
DOUBLE PRECISION DTA 
CHARACTER*1 LAB,KHR 


DIMENSION LAB(12,MV),KHR(12,MV),STAC(MV) ,DTACMY) 


RMITS=-1.0&36 
OMIS=-1.0036 
NV=ND+NK 
IF (ND.EQ.0) GO TO 95 
IF (NK.EQ.0) GO TO 30 
NOD=0 
NKK=0 
co PACK 
DO 20 Jm1l,NV 
IF (LAB(9,J).£0.‘$') GO TO 10 
NOD-NDD+1 
DTACNDD)=OTACJ) 
GO TO 20 


CTO VOWED 


Cy ye VE Ve ee PED 


“12 
KHR(M,NKK)=KHR(M, J) 
15 CONTINUE 
29 CONTINUE 
30 IF (NTYPE.EQ.2) GO TO 90 
C SINGLE PRECISION QUTPUT 
DO 85 J=1.ND 
IF (DTA(J).NE.OMIS) STACJ)=DTA(J) 
IF (DTA(J).EQ.0MIS) STACJ)=RMIS 
85 eae 
(NK.GT.0) WRITE (KU,ERR=100) (STA(J),J=1,ND). 
* ((KHR(I.J), 1=1.12),J=1,NK) 
IF (NK.EQ.0) WRITE CKU,ERR=100) (STA(J).J=1.ND) 
GO TO 100 
DOUBLE PRECISION OUTPUT . 
Q0 IF (NK.GT.O) WRITE (KU,ERR@=100) (DTA(J),J=1,ND),. 
” (CKHR(I,3), 11,12), d—1,NK) 
IF (NK.EQ.0) WRITE (KU,ERR@100) (DTA(J),J=1,ND) 
GO TO 100 
CHARACTER OUTPUT ONLY 
95 IF (NK.GT.0) WRITE (KU,ERR=100) ({KHR(I,0), 11.12), J=1.NK) 
100 RETURN 


ra 


fom) 
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Index 


ABS function, 73 

Absolute value function, 73 

ACS function, 73 

Addition (+), 73 

Advanced applications of SYSTAT BASIC, 135-53 
AND, 74-75, 80 

APPEND, 54, 61, 66-67 

Appending files vertically, 61, 66-67 
Arc hyperbolic tangent function, 73 
Arccosine function, 73 

Arcsine function, 73 

Arctangent function, 73 

Arithmetic operators, 73 

ARRAY, 106, 134, 135-42 

Array variables (see ARRAY) 


ASCII files 
ee Reading, 33-35 
a Saving into, 44-51 
“ ASN function, 73 


7 


ATH function, 73 
ATN function, 73 
AVG function, 74 


Backslash, 23-23 

BCF function, 74 

Beginning of file variable (see BOF variable) 
Beginning of group variable (see BOG variable) 
Beta distribution functions, 74 

BIF function, 74 

BOF variable, 74 

BOG variable, 74 

BRN function, 74 

Built-in grouping variables, 74, 129-31 

BY, 128, 129-31 

By groups, 129-31 
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Caret Creating a data file/Statements 


Caret (4) (see Exponentiation) 
CASE, 74 
CDF (see ZCF) 
Changing variable type, 49 
Character variables, 16 
Creating using LABEL, 84-86 
Chi-square distribution function, 74, 139 
CODE, 70, 82-84 
COLD commands, 8 
Combining files, 61-67 
Horizontally (see Merging files horizontally) 
Vertically (see Appending files vertically) 
Command equivalents to mouse clicks, generating, 6 
Command window, 6 
Command reference, 154-64 
Computation, 110 
Conditional transformations (see IF... THEN command) 
COS function, 73 
CORRELATION matrix, 31 
Cosine function, 73 
COVARIANCE matrix, 31 
Creating a data file- 
Converting data files to ASCT files, 48 
Deleting cases, 57-59 
Dropping variables, 55-57 
Editing program lines 
Expressions, 96 
File manipulation, 52-67 
Labeling values, 84-86 
Listing a data file, 39-41 
Merging files, 61, 62-66 
Missing data, 21-23, 75, 98-99 
Printing a data file, 39-43 
Random data samples, 142—43 
Random distributions, 74 
Random numbers, 74, 136-42 
Ranking variables, 120, 120-24 
Recoding values, 82-84 
Selecting subsets of variables, 48 
Sorting variables, 115-20 
Standardizing data, 123-24, 146-49 
Statements, 95-96 
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Creating a data file/Subgroup processing Exponential distribution functions 


Subgroup processing ,126-31 

Temporary data files, 72 

Transformations, 68-87, 93-94 

Transposing a file, 59-60 

Triangular files, 31-35 | 
Creating an ASCII file from a SYSTAT file, 48 
Creating new variables, 78, 84-86 


Data files 
ASCII (see ASCII files) 
SYSTAT (see SYSTAT data files) 
Decimal places, 47 
DELETE, 54, 57-59 
Deleting cases, 57-59 
DIAGONAL, 12 
DIM, 90, 107 
DISSIMILARITY matrix, 31 
Division (/), 73 
DROP, 54, 55-57 
Dropping variables, 55-57 


Poth a 


ECF function, 74 
Editing a BASIC program, 94-95 
EIF function, 74 
ELSE, 90, 99-100 
$- End of file variable (see, EOF variable) 
End of group variable (see EOG variable) 
Entering data, 10-35 
From ASCII file, 14, 18-19 
From a keyboard, 14, 18 
Entering triangular matrices, 31 
Entering variable names, 14 
EOF variable, 74 
EOG variable, 74 
Equal to (=), 73 
ERASE, 90, 95 
ERN function, 74 
Evaluation of expressions, 95-98 
EXP function, 73 
Exponential distribution functions, 74 
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Exponential function Greater than 
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Exponential function, 73 
Exponentiation (4), 73 
Expressions, 95-98 
Character, 96-97 
Numeric, 96 
Relational, 97-99 
Extracting variables, 56 
Extraneous data values, 20 


F distribution, 74, 140 
F distribution functions, 74 
FCF function, 74 
Field formats, 27~30 
FIF function, 74 
File manipulation, 52-67 
File structure, 165-70 
Missing data, 167 
Precision, 167 
Record, 165-66 
Subroutines, 166-67 
Filenames (see Getting Started) 
Fixed-format input, 16, 17-30 
Formatting symbols, 28 
Formats, 27-30 
FOR...NEXT loop, 90, 101-105 
Conditional , 103-104 
Control, 102 
Nesting, 103 
With ELSE, 104-05 
- With subscripted variables, 105-07 
Free-format input, 16, 17~27 
FRN, 74 
Functions, 73-75, 95 


Gamma distribution functions, 74 
GCF function, 74 

GET, 8 

GIF function, 74 

GOTO, 90, 108 

Greater than (>), 73 
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Greater than or equal to (>=), 73 
GRN function, 74 


HOT commands, 8 
HOLD, 134, 143-44 
Hyperbolic arctangent, 73 


IF... THEN command, 70, 90 

IF... THEN...LET, 79, 99 

IF... THEN...ELSE format, 100-101 
[legal values, 76 

IMPORT, 12 

Importing files from other applications, 33 
Incomplete records, 24-25 
Incrementing values of a variable, 144 
Inequalities, 73 

INPUT, 12 

Input formats, 27-30 

INT function, 73 

Integer truncation function, 73 
Inverse distribution functions, 74 


Keyboard input, 18 


LABEL, 70, 84-86 

LAG, 73, 86-87, 149-51 
Lagging variables (see LAG) 
LET, 70, 76-79, 91 

Less than (<), 73 

Less than or equal to (<=), 73 
LGM function, 73 

Line numbers, 94 

LIST, 38, 39-41 

Listing a SYSTAT data file, 39-41 
LOG function, 73 
Logarithms, 73 

Logging variables, 106-07 
Logical operators, 74 
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Loops 


176 


SSS 


Numerical accuracy 
Loops, 101-07, 108 


MAC, 46 
Matrix input, 31 
function, 74 
Medians, computing, 119-20 
Memory limitations, 110 
Merging files horizontally, 61, 62-66 
By key variables, 62 
MIN function, 74 
MIS function, 74 
Missing values, 21-23, 75, 98-99 
Character, 16 
Evaluating expressions involving missing values, 75, 98-99 
Numeric, 15 
Multi-variable functions, 74 
Multinormal random variables, 141-42 
Multiple cases per record, 23-24 
Multiple LET Statements, 79 
Multiplication ("), 73 


Naming data files (see Getting Started) 
Naming variables, 15-16 
Natural logarithm function (see LOG) 
Nested loops, 103 
Nested sort, 118-19 
Normal distributions, 138-39 
Using URN, 138 . 
Using ZRN, 139 
With specified parameters, 139 
Normal distribution functions, 74 
Normalized scores, 123-24 
NOT, 74 
Not equal to (<>), 73 
Notation, 7 | 
NRAN (see ZRN) 
Numeric values, 14-15 
Numerical accuracy, 110 
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Opening a SYSTAT data file Re-expressing data 
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Opening a SYSTAT data file (see Getting Started) 
Operations within rows, 135 

Operators, 73-75, 95 

OR, 74-75, 80 

Order of expression evaluation (see Order of operations) 
Order of operations, 74-75 

OUTPUT, 38, 42-43, 46, 47 


Placeholders, 7 

Printing data, 39-43 

PRINT, 38, 42, 47, 91, 108-09 
Processing subgroups, 126-31 
Programming, 88-110 
Programming examples, 132-53 
Prompt, 6 

PUT, 46, 47, 48 

Putting data into a text file, 48 


Quantiles, computing, 120 
Quotation marks, 16 


Random distribution functions, 74 
Random numbers, 74, 136-42 
Random subsamples, selecting, 142-43 
RANK, 114 
Ranking variables, 120, 120-24 
_ Large files, 122 : 
Normalized scores, 123 
Trimmed means, 122-23 
Winsorized means, 122-23 
Reading from a text file, 18-19 
Rearranging files, 55-60 
Recoding values, 82-84 
Record length, 23~24 
Records, 165 
Records with extraneous data values, 20 
RECTANGULAR matrix, 31 
Rectangular SYSTAT files, 31 
Re-expressing data, 77-78 
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Relational expression Summing a variable 


Relational expression, 97 
Relational operators, 73 
Reordering variables, $7 
REPEAT, 38, 109-10 
REPEAT...UNTIL loop, 108 
Rows, Operations within, 135 
RSEED, 134 

RUN, 13, 38 


SAVE, 13 

Saving data in text files, 44-51 
Saving selected cases, 48, 151 
Saving selected variables, 49 

Scientific notation, 15 

Selecting cases, 142-43 

SIMILARITY matrix, 31 

Simple data transformations, 76-87 

SIN function, 73 

SORT, 114 

Sorting variables, 115-20 
Nested sort, 118-19 

SQR function, 73 

Square root function, 73 

STANDARDIZE, 114, 123-24 

Standardizing variables, 123-24, 146-49 

STD function, 74 

STOP, 91 . 

Subgroup means, 131, 152-53 

Subgroup processing, 126-31 

Subscripted variables 
Computing means of, 135-36 

Subsets of cases, 48, 142-43 

Subsets of variables, 48 

Subtraction (-), 73 

SUM function, 74 

Summing a variable, 145 
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SYSTAT BASIC, 88-110 
Editing, 94-95 
Erasing statements, 95 
Errors, 94 
Execution of, 92 
Expressions, 96-99 
Functions, 73-75 
Incrementing a variable, 144 
Line numbers, 94 
Memory limitations, 110 
Numerical accuracy, 110 
Operators, 73-75 
Random data samples, 142-43 
Random numbers, 74 
Saving your work, 93-94 
Standardizing, 124-25 
Statements, 95-96 
Subgroup means, 131 
Transformations, 68-87 
SYSTAT data files 
Creating, 44-51 
Deleting cases, 57-59 
Dropping variables, 55-57 
Listing, 39-41 
Merging, 62-66 
Naming, (see Getting Started) 
Opening, (see Getting Started) 
Printing, 39-43 . 
Saving, 93-94 
Selecting variables, 48 
Structure, 16570 
Temporary data files, 72 
Transposing, 59-60 
Triangular, 31 
USEing, 38, 47, 54 
ASCII file reading, 33-35 


t distribution, 140 

t distribution functions, 9 
Tab delimited files, 47 
TAN function, 73 
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TAN function 
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‘Tangent function 
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Variables/Numeric 


Tangent function, 73 
TCF function, 74 
Temporary files, 72 

emporary subscripts, 106 
TIF function, 74 
Transformed data, saving, 72, 93-94 
Transforming variables, 68-87 
TRANSPOSE, 54, 59-60 
Transposing a file, 59-60 
Triangular matrices, entering, 21—22 
Trigonometric functions, 73 
Trimmed means, 122~23 
TRN function, 74 
TYPE, 13 
Types of data , 14 

Character, 16 

Numeric, 14-15 


UCF function, 74 
UIF function, 74 
Unary minus function, 73 
Unequal length records, 19-20 
Uniform distribution, 137-38 
On (0,1), 137 
On (a,b), 138 
Uniform distribution functions, 74 
Uniform integers, 138 
Uniform random numbers, 137 
Unpacking records, 50-51 
URN function, 74, 137-38 
USE, 38, 47, 54 


Variable length records, 19-20 
Variables 
Changing types between numeric and character, 49 
Character, 16 
Dropping, 55~57 
Listing, 40 
Naming, 15-16 
Numeric, 14~15 
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Variables/Selecting subsets of ZRN function 


Selecting subsets of, 48 
Subscripted, 135-36 


WHILE loop, 108° 
Winsorized means, 122-23 
Writing SYSTAT files, 44-51 


XCF function, 74 
XIF function, 74 
XRN function, 74 


‘ZCF function, 74 
ZIF function, 74 
ZRN function, 74, 137, 139 
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